Aidong Zhang IIS Div Of Information & Intelligent Systems
CSE Direct For Computer & Info Scie & Enginr
April 1, 2012
March 31, 2017 (Estimated)
Awarded Amount to Date:
Michael Franklin firstname.lastname@example.org (Principal Investigator)
Scott Shenker (Co-Principal Investigator) Alexandre Bayen (Co-Principal Investigator) Ion Stoica (Co-Principal Investigator) Michael Jordan (Co-Principal Investigator)
University of California-Berkeley
Sponsored Projects Office
INFORMATION TECHNOLOGY RESEARC,
Program Reference Code(s):
Program Element Code(s):
Making Sense at Scale with Algorithms, Machines, and People
University of California, Berkeley
The world is increasingly awash in data. As more and more human activities move on line, and as a growing array of connected devices become integral part of daily life, the amount and diversity of data being generated continues to explode. According to one estimate, more than a Zettabyte (one billion terabytes) of new information was created in 2010 alone, with the rate of new information increasing by roughly 60% annually. This data takes many forms: free-form tweets, text messages, blogs and documents; structured streams produced by computers, sensors and scientific instruments; and media such as images and video.
Buried in this flood of data are the keys to solving huge societal problems, for improving productivity and efficiency, for creating new economic opportunities, and for unlocking new discoveries in medicine, science and the humanities. However, raw data alone is not sufficient; we can only make sense of our world by turning this data into knowledge and insight. This challenge, known as the Big Data problem, cannot be solved by the straightforward application of current data analytics technology due to the sheer volume and diversity of information. Rather, to solve it requires throwing away old preconceptions about data management and breaking down many of the traditional boundaries in and around Computer Science and related disciplines.
The Algorithms, Machines, and People (AMP) expedition at the University of California, Berkeley is addressing this challenge head-on. AMP is a collaboration of researchers with a wide range of data-related expertise, committed to working together to create a new data analytics paradigm. AMP will produce fundamental innovations in and a deep integration of three very different types of computational resources:
1. Algorithms: new machine-learning and analysis methods that can operate at large scale and can give flexible tradeoffs between timeliness, accuracy, and cost.
2. Machines: systems infrastructure that allows programmers to easily harness the power of scalable cloud and cluster computing for making sense of data.
3. People: crowdsourcing human activity and intelligence to create hybrid human/computer solutions to problems not solvable by today's automated data analysis technologies alone.
AMP research will be guided and evaluated through close collaboration with domain experts in key societal applications including: cancer genomics and personalized medicine, large-scale sensing for traffic prediction and environmental monitoring, urban planning, and network security. Advances pioneered by the project will be made widely available through the development of the Berkeley Data Analysis System (BDAS), an open source software platform that seamlessly blends Algorithm, Machine and People resources to solve big data problems.
For more information visit http://amplab.cs.berkeley.edu
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Peter Bailis, Shivaram Venkataraman, Michael Franklin, Joseph M. Hellerstein, Ion Stoica. "Probabilistically Bounded Staleness for Practical Partial Quorums," Proceedings of the VLDB Endowment 2012, v.5, 2012, p. 776-787.
Jiannan Wang, Tim Kraska, Michael Franklin, Jianhua Feng. "CrowdER: Crowdsourcing Entity Resolution," Proceedings of the VLDB Endowment 2012, v.5, 2012, p. 1483-1494.
Kay Ousterhout, Aurojit Panda, Joshua Rosen, Shivaram Venkataraman, Reynold Xin, Sylvia Ratnasamy, Scott Shenker, and Ion Stoica. "The Case for Tiny Tasks in Compute Clusters," In Proceeding of the 14th USENIX Conference on Hot Topics in Operating Systems, 2013, p. 14.
Reynold Xin, Joseph Gonzalez, Michael Franklin, Ion Stoica. "GraphX: A Resilient Distributed Graph System on Spark," Proceedings of the First International Workshop on Graph Data Management Experience and Systems (GRADES 2013), 2013.
Rishabh Iyer, Stefanie Jegelka, and Jeff Bilmes. "Fast Semidifferential-Based Submodular Function Optimization," In Proc. International Conference of Machine Learning (ICML), 2013.
Timothy Hunter, Tathagata Das, Matei Zaharia, Pieter Abbeel, and Alexandre M. Bayen. "Large-Scale Estimation in Cyberphysical Systems Using Streaming Data: A Case Study with Arterial Traffic Estimation," IEEE Transactions on Automation Science and Engineering, v.10, 2013, p. 884.
Peter Bailis, Kyle Kingsbury. "The Network is Reliable: An informal survey of real-world communications failures," Communications of the ACM, v.12, 2014, p. http://qu.