The Sky Is No Limit: 13 Research Teams Compute in the Clouds
NSF awards nearly $4.5 million to innovative projects to participate in NSF / Microsoft cloud computing collaboration
The National Science Foundation (NSF) today announced the awardees who will be funded through the collaborative cloud computing agreement that Microsoft Corp. and NSF announced in February 2010.
The agreement will offer the award recipients--individual researchers and research groups--free access to advanced cloud computing resources that enable faster, less expensive processing across geographically distributed data centers. By extending the capabilities of powerful, easy-to-use PC applications via Microsoft cloud services, the program will help broaden research capabilities, foster collaborative research communities, and accelerate scientific discovery.
Microsoft will provide the 13 cloud computing research projects identified by NSF through its rigorous peer review process with access to Windows Azure--a cloud computing platform that provides on-demand computing and storage to host, scale and manage Web applications on the Internet through Microsoft data centers--for a two-year period, along with a support team to help researchers quickly integrate cloud technology into their research. Microsoft researchers and developers will work with grant recipients to equip them with a set of common tools, applications and data collections that can be shared with the broad academic community.
"An object by itself is intensely uninteresting," Grady Booch famously said. Never have his words been more prescient than today in the data-dominated, cyber-social-physical networked world in which we live, scientists work, and collaboration, data integration and association are at a premium. Yet analyzing and synthesizing this mass of data within and across domains--a critical tool for scientific advancement--is fraught with challenges. The goal of the new program is to make simple yet powerful tools readily available to researchers to use to extract insights by mining and combining diverse data sets.
"Cloud computing represents a new generation of technology in this new era of science, one of data-driven exploration. It creates precedent-setting opportunities for discovery," said Farnam Jahanian, assistant director of the NSF Directorate for Computer and Information Science and Engineering. "We are especially proud of these excellent projects, led by top researchers at universities throughout the country that we think will best capitalize on the NSF/Microsoft partnership. They will use the resources Microsoft will provide to explore and experiment with cloud computing in order to address some of society's greatest challenges."
Daniel Reed, Microsoft's corporate vice president for the eXtreme Computing Group and Technology Strategy and Policy, said, "Increasingly, the important scientific questions lie at the intersections of traditional disciplines, so I'm pleased to see such technical diversity and interdisciplinarity among this list of award recipients. They illustrate the range of scientific fields that can take advantage of the cloud for rich data analytics on large data sets via powerful, easy to use tools. This lets scientists be scientists, enabling them to focus on discovery rather than computing infrastructure. I look forward to seeing the results of this potentially groundbreaking work."
The following projects, each led by the named principal investigator (PI), have received NSF funding to participate in the NSF / Microsoft cloud computing collaboration:
Cornell University (Kenneth Birman) - Building Scalable Trust in Cloud Computing
A growing spectrum of societal-critical, highly sensitive applications are shifting towards cloud computing to benefit from lower costs. These include those related to medicine, from treatment to surgical procedures. Lingering issues need yet to be addressed including high availability, secure access, fault tolerance and the preservation of privacy and real-time responsiveness. These researchers will explore the consistency issue as cloud computing applications are applied to large-scale systems, which will contribute towards a scientific foundation for scalable trust in cloud computing.
J. Craig Venter Institute, Inc. (Andrey Tovchigrechko) - Bettering Interactive Protein-Protein Docking
Understanding the detailed mechanism of protein-protein interactions is essential in many areas of molecular biology. Computationally modelling protein to protein interactions in the third dimension--or "protein-protein docking," as it is called--is computationally intensive. Due to the inherent complexity of the problem, protein-protein docking may often suggest only a set of putative complex structural arrangements. However, when combined with certain molecular experiments, computational docking can be a powerful tool. This project will use the Azure cloud platform to address two restrictions within the currently existing protein-protein docking paradigms: insufficient scalability and lack of interactivity.
State University of New York (SUNY) at Buffalo (Tevfik Kosar) - Enhancing Stork Data Scheduler for Azure
Stork Data Scheduler has been actively used in many application areas, including coastal hazard prediction and storm surge modeling; oil flow and reservoir uncertainty analysis; numerical relativity and black hole collisions; educational video processing and behavioral assessment; digital sky imaging; and multiscale computational fluid dynamics. These researchers aim to further develop and enhance the Stork Data Scheduler to support the Azure cloud computing environment, and to mitigate the end-to-end data handling bottleneck in data-intensive cloud computing applications. The result, the team predicts, will be to dramatically change how domain scientists perform their research by facilitating sharing of experiences and raw data.
University of California, San Diego (Kenneth Yocum) - Utilizing Continuous Bulk Processing
Today's rapid data "deluge" in the scientific enterprise gives rise to many exciting data mining opportunities. This project will explore an alternative data processing architecture that fundamentally improves computing efficiency to reduce costs and provide enhanced data mining capabilities for cloud computing--continuous bulk processing. A key facet of the approach is to allow analytics to simply be updated, not recomputed, when new data arrives. The work will explore the ultimate reach of this incremental approach, determining how users may trade cost for performance for incremental analytics.
University of Colorado Boulder (Richard Han) - Enabling Mobile Cloud Computing
The main goal of this project is to define and develop a common cloud computing framework that can be used to stimulate the design and development of the next generation of mobile applications. Future mobile applications will likely become increasingly context-specific and demand ever more resources from the cloud while insisting on real-time performance. The project identifies a representative next-generation mobile application called "VideoLense tricorder." With this application, and while addressing the new and unique challenges it poses from traditional cloud services, the team will investigate key research questions to enable such futuristic mobile applications.
University of Michigan, Ann Arbor (Qiaozhu Mei) - Refining Language Models using Web-scale Language Networks
In rich, online communities, an overload of text data is continuously produced where rich and interesting information--topics, events, opinions, behaviors, intents, rumors, needs--even scientific discoveries--may be buried. Statistical language models not only enable the efficient retrieval of that information, but also enable the discovery of interesting patterns from the text content, affording insight into the people who created the content. The quality and performance of language models are usually limited due to the sparseness of data, mismatch of context and neglect of connections between certain words and phrases. These researchers will refine these models by using large, Web-scale datasets and the power of the cloud. Experimenting in health informatics, they aim to glean valuable insights on how to more effectively seek and route information.
University of North Carolina at Charlotte (Zhengchang Su) - Predicting Transcription Factor Binding Sites for Genes
Although huge advances have been made in identifying the gene-coding DNA sequences in bacterial genomes using computational methods, the understanding of regulatory DNA sequences is limited due to the lack of efficient computational and experimental methods for predicting them. These researchers will capitalize on the recently reduced time and cost of sequencing a genome, to improve knowledge of gene regulatory systems in single-celled organisms. The knowledge garnered will amplify scientists' understanding of biology, with broad predicted impacts on applications for broad areas such as renewable energy production, environmental protection and disease prevention.
University of South Carolina Research Fund (Jonathan Goodall) and the University of Virginia (Marty A. Humphrey) - Managing Large Watershed Systems
Understanding hydrologic systems at the scale of large watersheds is critically important to society when faced with extreme events, such as floods and droughts, or with concern about water quality. Climate change and increasing population are further complicating watershed-scale prediction by placing additional stress and uncertainty on future hydrologic system conditions. This project advances hydrologic science and water resource management by creating and using a cloud-enabled hydrologic model and data processing workflows to examine the Savannah River Basin in the Southeastern United States. This will provide the detail and scale necessary to address fundamental research questions related to quantifying impacts of climate change on water resources.
University of Southern California (Viktor Prasanna) - Tackling Large Scale Graph Problems
The adoption of cloud computing has been impeded by concerns related to corporate governance, data privacy and security, and the Health Insurance Portability and Accountability Act (HIPAA) that mandate data storage and access auditing. Managing private clouds that offer scalability is very expensive. Integrating public and private clouds seamlessly is not easy. Even more daunting are the complexities of developing applications that understand the cloud programming paradigm and can best derive the benefits of the cloud infrastructure. This project will devise a framework to address these challenges in order to enhance the availability and efficiency of the cloud. The researchers plan to demonstrate its framework using applications in the areas of real-time search and ranking and semantic association discovery for healthcare and energy informatics.
University of Texas at Austin (Michael Walfish) - Storing Data with Minimal Trust
Researchers will work to determine how to build a cloud storage service under minimal trust assumptions--in other words, without the clients having to assume that the providers will always operate correctly. Issues particularly relevant to cloud storage include those involving storage service providers operated by a party other than the data owner, software bugs, correlated manufacturing defects, misconfigured servers, operator error, malicious insiders, bankruptcy, fires and more.
University of Washington (Magdalena Balazinska) - Understanding Relational Data Markets
While today's cloud computing systems offer simple pricing schemes for storage and computing resources, the economics of data sharing are poorly understood and only coarsely supported. This research endeavor will develop models and infrastructures to establish relational data markets in the cloud and build a prototype system to implement these models and support both data pricing and ad hoc data sharing. This system will enable users to sell their data in the cloud, choosing how to price it and query results. It will enable users to buy and combine data from different providers, possibly reselling it in turn. Finally, it will support efficient and fair data sharing between individual scientists.
Virginia Tech (Wuchun Feng) - Conducting Intensive Biocomputing
With DNA sequencers in the life sciences able to generate a terabyte--or one trillion bytes--of data a minute, the size of DNA sequence databases will increase 10-fold every 18 months. This will ultimately create a need for computational power to increase 50 times faster than Moore's Law (which holds that the number of transistors that can be placed inexpensively on an integrated circuit board will double approximately every two years). Thus, scientists and engineers must increasingly rely on high-performance computing (HPC) to keep pace, which is often costly and difficult to access and use. This research team aims to create a new generation of efficient data management and analysis software for large-scale, data-intensive scientific applications in the cloud. They will leverage recent experience in delivering reliable computing over volatile cloud resources to further enhance the robustness of data management and analysis software. They will strive to eliminate the need to assume "no hardware failures" or "very infrequent failures" as is the case with traditional HPC data-management techniques.
Virginia Tech (Kwa-Sur Tam) - Effectively and Widely Using Renewable Energy Sources
An accurate forecast is key to effective utilization of weather-dependent renewable energy sources, such as wind and solar. Weather forecasting is a complex and data-intensive computing process. This project seeks to develop the Forecast-as-a-Service (FaaS) framework in order to: enable the combined use of different types of data from different sources for new prediction models to enhance the synthesis of more accurate forecasts; and support on-demand delivery of forecasts of different types and at different levels of detail for varying prices to accommodate renewable energy users with different needs and varying budgets.
About Microsoft Research
Founded in 1991, Microsoft Research is dedicated to conducting both basic and applied research in computer science and software engineering. More than 850 Ph.D. researchers focus on more than 55 areas of computing and openly collaborate with leading academic, government, and industry researchers to advance the state of the art of computing, help fuel the long-term growth of Microsoft and its products, and solve some of the world's toughest problems through technological innovation. More information can be found on the Microsoft Research and Azure Research Engagement pages on the Microsoft website.
Founded in 1975, Microsoft (Nasdaq "MSFT") is the worldwide leader in software, services and solutions that help people and businesses realize their full potential.
The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2017, its budget is $7.5 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives more than 48,000 competitive proposals for funding and makes about 12,000 new funding awards.
Useful NSF Web Sites: