Photo by NSF/
Dr. France A. Córdova
2015 Big Data in Biomedicine: Driving Innovation for a Healthier World
May 21, 2015
[Slide #1: Title slide - Big Data at the National Science Foundation]
Thank you, Carlos. Good morning, everyone. It's really a pleasure to be back here at Stanford, where, as Carlos said, I spent my undergraduate career and had a wonderful time. It is exciting to hear of developments in medicine that are being propelled by big data.
I would like to broaden the discussion and talk about big data in other areas of science. The National Science Foundation supports research in all areas of engineering and science, and big data is changing how scientists perform their work and how future scientists are being educated.
As Carlos just mentioned, I am an astrophysicist by training and in that community, we are no strangers to the concept of big data. The universe, after all, is a big place. We have been dealing with big science for many years, including big datasets, big simulations, and sometimes, huge collaborations. The impact of this on our knowledge of the universe has been tremendous. And I know that the potential impact of big data in biomedicine to improve healthcare and quality of life is also huge.
I think we all recognize the importance of big data for transforming science and benefitting society, but let me say a few words about this.
We are moving from the Information Age to the Age of Analytics; from the Industrial Revolution to the Data Revolution. Advances in our ability to store, integrate, and extract meaning and information from increasingly large-scale and diverse data sets are critical to accelerating the pace of discovery in every science and engineering discipline. From new insights about clinical decision-making to new ways to mitigate and respond to natural disasters or to develop new strategies for effective learning and education--the impact and opportunities for data-driven discovery are remarkable.
Big data also has the potential to solve some of the Nation's most pressing challenges--not just in healthcare and medicine, but also in education, energy, transportation, commerce, disaster prevention and mitigation, and cyber and national security--indeed, in almost every facet of society--yielding enormous societal benefit and enhanced quality of life.
[Slide #2: Training Computers to Analyze Breast Cancer ]
I want to begin by illustrating the incredible power of data by telling you a story about the work of Daphne Koller, who is here at Stanford and has been involved in the big data movement before it was big.
You may know Daphne from her dedication to education--as the President of Coursera, an education platform that offers free courses online to people who want to learn as well as to enable researchers to mine data on how people learn; from her leadership of the university's summer research experience for undergraduates in computer science program, which has trained more than 500 students; or as the founder of Stanford's Biomedical Computation major.
Not only has Daphne been an advocate for big data's potential for education and the science of learning, but she is also a leading computer scientist turned biologist. Daphne and a team of computer scientists and pathologists from Stanford developed a model to teach computers to analyze breast cancer. By measuring numerous novel morphological features of images of breast cancer tissue, the model can more accurately determine cancer diagnosis and prognosis than humans can. An astounding outcome of this work was that the cellular features that were the best predictors of patient survival were not from the cancer tissue itself, but rather from adjacent tissue--something that had gone undetected by pathologists and clinicians.
Harnessing big data has the potential for big impact in all areas of science and engineering, driving new innovations, and addressing some of today's most pressing challenges--enhancing our quality of life.
[Slide #3: NSF's Mission ]
Now, let me take a few minutes to give you the big picture about NSF's role in the national--and global--science and engineering enterprise.
The National Science Foundation is the only federal agency whose mission includes support for research across all fields of science and engineering.
Our mission is "to promote the progress of science, to advance the national health, prosperity and welfare, to secure the national defense."
We engage the research community to develop new fundamental ideas, which are then evaluated by the best researchers through a robust peer review process. These ideas are evaluated by both their intellectual merit and broader impacts.
[Slide #4: NSF by the Numbers ]
NSF operates with an annual budget of over $7 billion. We pride ourselves on being a lean agency, with 94 percent of the budget returned to the nation to fund research and educational activities. We receive about 50,000 proposals each year, and through the merit review process, select about 11,000 of those proposals for funding. Funding support reaches about 2,000 institutions and 300,000 researchers annually.
At NSF, we aren't only where discoveries begin, but also where discoverers begin.
NSF has a proud history of supporting our nation's best and brightest, whose research results have transformed our world--enabling a broad array of innovations, from technologies in our mobile devices like GPS and 4G, the Google search engine, and even the Internet itself, to MRIs, novel drugs for medicine, the bounty and safety of our food supply, the discovery of the Higgs Boson, and the invention of 3-D printing.
We are proud to say that 214 Nobel Prize winners, and thousands of other trailblazers, have transformed our understanding and our world with NSF funding.
In July, we mark the 65th anniversary of the establishment of NSF. The vision for NSF was, in large part, laid out in the report of Vannevar Bush titled Science: The Endless Frontier, which was commissioned by President Franklin D. Roosevelt near the end of the Second World War. The vision was to build a strong foundation of knowledge upon which innovation can thrive through support of basic or fundamental research.
In addition to being committed to fundamental research, we also ensure that we adapt to changes that have occurred in the last 65 years. Big data is an example of such an area--data are motivating a profound transformation in the culture and conduct of scientific research.
[Slide #5: NSF Leadership in the National Big Data R&D Initiative]
To harness the potential of data, NSF has been a leader in helping to coordinate data science research and development activities across the federal government and across the nation.
In spring of 2011, the White House stood up an interagency group with NSF and NIH as the co-chairs and with members from 16 other federal agencies.
A year later, NSF provided leadership for the launch of the National Big Data Research and Development Initiative. Most of the member agencies made commitments to invest in big data research and development, with investments totaling more than $200 million that year. The cornerstone of these announcements was a joint research program between NSF and NIH.
And in November 2013, responding to President Obama's call for "All hands on deck," and ensuring that the Big Data Initiative is not just for federal agencies, but also a national effort, NSF again provided leadership in an event announcing 30 multi-stakeholder partnerships with 90 partners committed to work together to move data to knowledge to action.
[Slide #6: NSF Data Science Strategy]
While interest in data science has risen exponentially over the past several years, NSF has been investing in data science for several decades, well ahead of the curve.
NSF has a bold, comprehensive, five-part approach for this increasingly data-centric world.
- First, we support fundamental research to scale collection, management, storage, and analysis of data.
- Second, NSF invests in research infrastructure at national and international levels to support and serve our research and education communities' needs for data-intensive science and engineering.
- Third, NSF supports education and workforce development to improve the nation's capacity in data science.
- Fourth, NSF enables community building, collaborations, and partnerships to support interdisciplinary science and to accelerate the transition of research into practice.
- And lastly, policy plays an important enabling role as it enables broad dissemination and sharing of data, software and knowledge.
Let me take a few minutes to elaborate on these areas.
[Slide #7: Fundamental Data Science Research]
NSF investments in fundamental research in data science help to develop and prototype new techniques and technologies to derive knowledge from data.
This research aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from big data sets. This includes facilitating the development of new data analytic techniques and algorithms; scalable, accessible, and sustainable data tools; and large-scale integrated statistical modeling.
I've already told you about Daphne's method to analyze images of breast cancer. Let me also provide an example that focuses on data tools by telling you about Matei Zaharia, who is the latest recipient of the ACM Doctoral Dissertation Award. In his graduate work, he developed a new, resilient way to perform computations on a cluster of computers that matches or exceeds the performance of specialized systems for many applications.
In October 2012, just six months after the launch of the National Big Data R&D Initiative, NSF and NIH announced nearly $15 million in new big data fundamental research projects. By the end of this fiscal year, this program will have invested nearly $80 million in critical techniques and technologies for big data.
[Slide #8: Data-intensive Cyberinfrastructure]
Another pillar in NSF's strategy for data science is supporting cyberinfrastructure to manage, curate, and serve data to research communities. For nearly four decades, NSF has supported advanced computing infrastructure for the nation's research and education communities. While this began initially as support for supercomputer centers--an effort that has been tremendously successful--we are now planning for a future that embraces both high-performance computing as well as data-intensive computing.
To illustrate the range and scope of some of today's cutting-edge data- and computationally-intensive cyberinfrastructure supported by NSF, let me give you just a few examples.
- In astronomy, the Sloan Digital Sky Survey collected more data in its first few weeks of operation than had been amassed in the entire previous history of astronomy; and the Large Synoptic Survey Telescope, which is being built in Chile and due about 2020, will have the ability to amass double the quantity of data that the Sloan Digital Sky Survey collects in a decade in just one week.
- In ecology, NEON, a continental-scale research instrument integrating sensor networks, biological assessments, including natural history archive information, and remote sensing data is poised to revolutionize our understanding of the natural world around us.
[Slide #9: Cyberinfrastructure for Brain Research]
- And the proposed National Brain Observatory would help neuroscientists collect, standardize, manage, and analyze the large amounts of data that will result from research attempting to understand how the brain functions.
- As a testament to the potential of a tool like this, let me show you an NSF video featuring neuroscientists here at Stanford, led by Russell Poldrack, who have launched new infrastructure to enable fMRI data to be shared easily and securely in a standardized format.
[Video (3:09): https://www.nsf.gov/news/special_reports/science_nation/openfmri.jsp?WT.mc_id=USNSF_51]
[Slide #10: Education & Workforce Development]
Our investments in Big Data research and infrastructure are accompanied by investments in Big Data education and workforce development. We aim to improve the nation's capacity in data science by investing in the development of human capital.
A 2011 McKinsey report predicted, "By 2018 the United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data."
[Slide #11: Education & Workforce Development (2)]
In this era of big data, where data analytics is a core skill, we must strive to train "Pi People"--a term coined by Alex Szalay of Johns Hopkins University. Pi People represents individuals with a pillar of knowledge in their own field, and a second pillar in the ability to execute the statistical and computational methods that drive modern data-intensive research connected with interdisciplinary science and engineering.
In order to ensure that graduate students develop the skills, knowledge, and competencies needed to pursue a range of STEM careers, NSF launched the NSF Research Traineeship program and singled out data-enabled science and engineering as the priority research theme for its first two years.
[Slide #12: Collaborations & Partnerships]
Nurturing interdisciplinary science communities and meaningful collaborations is an important, fourth part of NSF's strategy for addressing big data. It is also an essential component in realizing the full power and benefits of big data approaches--whether in science, business, government, or other societal impact areas.
NSF is seeking to enable research communities to develop new visions, teams, and capabilities to accelerate discovery across all areas of science and engineering as well as to accelerate the transition of discoveries into practice.
By promoting strong connections between academia and industry, NSF further enhances its research portfolio in Big Data with fundamental concepts and new ideas that are directly relevant to the commercial sector.
NSF is currently in the process of establishing a national network of Big Data Regional Innovation Hubs. The vision is to form connected, multi-stakeholder regional consortiums with members from academia, industry, government, and the nonprofit sector to facilitate the transformation of data to knowledge to action, while also creating and maintaining an agile and sustainable national big data innovation ecosystem.
[Slide #13: Data Policy]
A final, important component of the data strategy at NSF is our policy framework that enables dissemination and sharing of data and knowledge.
Earlier this year, in mid- March, NSF released its public access plan, in which NSF recognizes the uniqueness of the broad community it supports. The plan is an open, flexible, and incremental approach to provide access to publications, data, and other products of NSF funding. I encourage you to see NSF 15-52 if you are interested in more details about our plan.
Recognizing that data sharing is important not just within our country but also across the globe, let me also point out that we fund the U.S. participation in an international organization, called the Research Data Alliance, which aims to accelerate research data sharing among scientists around the world. This alliance is working to tackle data sharing challenges such as interoperability, stewardship, sustainability, and use.
[Slide #14: Public Participation in Discovery]
We have seen powerful examples in the past few years where open and shared data have enabled profound scientific discoveries. In fact, sharing information and data is at the heart of public participation in scientific research, or crowdsourcing. Consider the game Foldit, an effort led by researchers at the University of Washington and funded by NSF, which enabled players across the world to map the structure of a protein that has the potential to help fight the HIV and AIDs viruses--a problem that had stumped scientists for 12 years--in just 10 days!
Together, advances in these five areas are essential for accelerating discovery across all areas of science and engineering, enhancing innovation, driving societal benefit, and improving our quality of life.
[Slide #15: Small Data and Mobile Health]
Before I conclude, let me leave with you one more example, but instead of talking about big data, I want to focus on small data. I want to tell you about Deborah Estrin, a computer scientist who has, through a series of events, been compelled to focus her efforts on mobile health and small data. Deborah believes that using an individual's mobile device to capture data and serve them back to an individual can have profound implications for healthcare.... Just two weeks before her father passed away, his behaviors radically changed. He stopped walking around the neighborhood, going to the grocery store, and responding to email. For any healthcare professional who didn't know him, they might not think that any of these behaviors were unusual for a 90-year old, but for Deborah's dad, they were a sign that something was wrong. A mobile phone easily monitors each of these activities, but a platform that serves this information back to an individual--or enables that person to share the data with family or caregivers--wasn't so easy. In order to change that, Deborah has co-founded a start-up company, called Open mHealth, to transform the way personal, digital data can be used in healthcare. You heard from one of Open mHealth's other cofounders, Ida Sim, yesterday.
The potential for data--both big and small--to transform healthcare is just tremendous!
[Slide #16: Big Data at NSF]
As the stories I've told illustrate, data science will help to enable healthcare that is more evidence-based, personalized, and proactive.
NSF is helping to make this vision a reality through its support of fundamental research to understand data, infrastructure to support science and education communities, education to meet the momentum of this data-centric world, collaborations and partnerships to support interdisciplinary science and accelerate innovation, and policy to enable dissemination and sharing of data.
Harnessing the power of data will not only shape the delivery of healthcare, but also will improve our overall quality of life for decades to come.