Email Print Share

News Release 11-028

A Scientific Gold Rush: Electronic Mining of Published Research

The journal Science publishes an important paper on harvesting vast amounts of "metaknowledge"

an upraised open hand with a background consisting of text and binary code.

Perspective article argues that electronically-mined research may lead to future breakthroughs.


February 10, 2011

This material is available primarily for archival purposes. Telephone numbers or other contact information may be out of date; please see current contact information at media contacts.

The knowledge of knowledge. The science of science. Riddles? No. A burgeoning and important field of scientific research that examines research itself, say University of Chicago Sociology Assistant Professor James Evans and Post-doctoral Scholar Jacob Foster. Their analysis, supported by the National Science Foundation (NSF), is published in a perspective piece to appear in the Feb. 11 issue of the journal Science.

A scientific approach to delving into the knowledge of knowledge--metaknowledge--offers great potential for new discovery, they argue. New possibilities may arise when one uncovers scientific bias, possible "ghost theories" or acquires an understanding of the context of research, and then accounts for those factors or eliminates them and engages in new research.

"We review the expanding scope of metaknowledge research, which uncovers regularities in scientific claims and infers the beliefs, preferences, research tools and strategies behind those regularities. Metaknowledge research also investigates the effect of knowledge context on content. Teams and collaboration networks, institutional prestige and new technologies all shape the substance and direction of research."

Metaknowledge can be very useful to a variety of disciplines and fields. Evans' and Foster's research, while primarily funded by NSF's Science of Science and Innovation Policy, was co-funded by NSF's Division of Chemistry interested in reviewing developments in Chemistry over time.

Metaknowledge may also be useful in shedding light on shorter term questions. Google used computational content analysis to identify the emergence of influenza outbreaks by identifying and tracking related Google searches. The process was faster than other techniques typically used by health officials.

"Collaboration is revealed to be much more important to the future of science policy," explains Julia Lane, director of NSF's Science of Science Innovation Policy program (the other co-funder of this research). "As the perspective so aptly put, 'the rise in scientific review articles and the concomitant explosion of scientific publications over the past century trace a growing supply and demand for the focused assessment and synthesis of research claims. As the number of analyses investigating a particular claim has become unmanageable ... researchers have increasingly engaged in meta-analysis-counting, weighting and statistically analyzing the census of published findings on the topic.'"

According to the perspective's authors, metaknowledge sheds light on the role funding plays in science. "There is evidence from the metaknowledge that embedding research in the private or public sector modulates its path," Evans and Foster write. "Company projects tend to eschew dogma in an important hunt for commercial breakthroughs, leading to rapid but unsystematic accumulation of knowledge, whereas public research focuses on the careful accumulation of consistent results."

A promise of metaknowledge, they argue, is also its capacity to steer researchers into new fields, "Metaknowledge could inform individual strategies about research investment, pointing out overgrazed fields where herding leads to diminishing returns as well as lush range where premature certainty has halted promising investigation.

The ability of metaknowledge researchers to see connections and uncover previously missed aspects of research is powered, in part, by the growth of natural language processing (NLP), one of the rapidly emerging fields of artificial intelligence, largely supported by the NSF's Directorate on Computer and Information Science and Engineering.

NLP enables massive amounts of information, the details of fantastic discoveries and vast quantities of research funded by NSF and other organizations, to be electronically mined. Then machines can read, extract information from, and summarize enormous amounts of data.

"Extraordinary advances in computational abilities enable social scientists to further delve into the data in order that we may understand the sweep of science," said Lane, "the context, social networks, physical and institutional settings--the many factors that shape the findings themselves."

-NSF-

Media Contacts
Lisa-Joy Zgorski, NSF, (703) 292-8311, email: lisajoy@nsf.gov
William Harms, University of Chicago, (773) 702-8356, email: wharms@uchicago.edu

Program Contacts
Julia I. Lane, NSF, (703) 292-5145, email: jlane@nsf.gov

The U.S. National Science Foundation propels the nation forward by advancing fundamental research in all fields of science and engineering. NSF supports research and people by providing facilities, instruments and funding to support their ingenuity and sustain the U.S. as a global leader in research and innovation. With a fiscal year 2023 budget of $9.5 billion, NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and institutions. Each year, NSF receives more than 40,000 competitive proposals and makes about 11,000 new awards. Those awards include support for cooperative research with industry, Arctic and Antarctic research and operations, and U.S. participation in international scientific efforts.

mail icon Get News Updates by Email 

Connect with us online
NSF website: nsf.gov
NSF News: nsf.gov/news
For News Media: nsf.gov/news/newsroom
Statistics: nsf.gov/statistics/
Awards database: nsf.gov/awardsearch/

Follow us on social
Twitter: twitter.com/NSF
Facebook: facebook.com/US.NSF
Instagram: instagram.com/nsfgov