Skip to main content
Email Print Share

News Release 14-012

National Science Foundation contributes to four international projects in data-intensive social science and humanities research

Transatlantic Digging into Data Challenge 2013 winners announced

Digging into data challenge banner

International researchers receive grants to investigate computational techniques in the humanities.

January 16, 2014

This material is available primarily for archival purposes. Telephone numbers or other contact information may be out of date; please see current contact information at media contacts.

Ten international research agencies, including the United State's National Science Foundation (NSF), recently announced the winners of the third Digging into Data Challenge.

The Digging into Data program gives research teams the ability to develop new insights, tools and skills in innovative social science and humanities research using large-scale data analysis.

Fourteen teams representing Canada, the Netherlands, the United Kingdom and the United States will receive grants to investigate how computational techniques can be applied to "big data" in social sciences and the humanities. Each team represents collaborations among scholars, scientists and information professionals from leading universities and libraries in Europe and North America.

"With the National Science Foundation as a funding organization, Digging into Data continues to be an excellent mechanism for social science scholars carrying out data intensive research," said NSF program manager for Digging into Data Elizabeth Tran. "Going forward, we will look to identify ways in which we can make the datasets and tools that have come out of 'Digging' more accessible to the broader research community."

The first round of the Digging into Data Challenge was held in 2009 and the second in 2011. Previous Digging into Data research projects have received international attention.

For the current round, there are 10 sponsoring agencies from four countries that jointly fund a total of fourteen projects. Total funding from the 10 agencies is about US$5.1 million. The projects cover a wide variety of topics.

Additional information about the competition can be found at the Digging into Data website.

Digging into Data projects receiving NSF funding are highlighted below.

Organizing and Uniting Linguistic Databases (the COULD project)
Principal Investigators: Maria Polinsky, Harvard University, US; Alan Bale, Concordia University, Canada
Abstract: The COULD project has 5 goals. (1) It seeks to transfer existing linguistic data from a variety of different formats into a universal format that will allow linguists to combine and share information, not only with other linguists but also with the public at large. (2) The project will build applications that automatically correct errors, draw attention to inconsistencies and fill gaps in the data. (3) These automated mechanisms will provide new tools to detect patterns that are not obvious when looking at smaller databases. (4) The project seeks to make the vast amounts of linguistic data, currently only being used by researchers, available to second language learners by developing search algorithms that facilitate lesson creation. (5) The project will make data collection easier and thus make language preservation and documentation less dependent on experts. Communities trying to revive endangered languages will benefit directly from this project.

Field Mapping: An Archival Protocol for Social Science Research Findings
Principal Investigators
: Frank Bosco, Virginia Commonwealth University, US; Piers Steel, University of Calgary, Canada
Abstract: In this project, psychology and management scholars from the United States and Canada will collaborate with an expert in online research and classification methods to devise a web application that will (1) enable the encoding of millions of individual findings in a multidisciplinary social science research domain, (2) facilitate complex analyses and (3) provide open access to members of the scholar community and the general public. The project provides protocols for the extraction and classification of research findings into a semantic taxonomy. The foundation of this taxonomy will change how researchers search for and analyze findings from big data. The project will develop efficient algorithms to access and analyze research findings. This will lead to an eventual goal--a comprehensive repository of findings from social science research that is updated continuously and responds to dynamic queries.

Legal Structures
Principal Investigators:
Adam Badawi, Washington University School of Law, US; Rens Bod, University of Amsterdam, Netherlands
This project takes a radically novel approach to the problem of measuring and visualizing differences among legal systems: it focuses on machine coding of internal references in codes and laws. Internal referencing is an inherent characteristic of codes. Already the Code of Hammurabi, almost 3800 years ago, was structured as a numbered list of laws with at least one cross-reference. The intuition behind this approach is that fundamental differences among legal systems manifest themselves in the structure of the texts and can be detected, parameterized, and visualized using computerized algorithms. For instance, the French Civil Code--based on a deductive ideal of legal thought--has fewer internal references than the hundred-year younger German Civil Code--influenced by the idea that law finds its legitimacy in the history of a country rather than on natural principles and hence is less organically structured. This project will use this procedure to analyze the world's codes.

MIning Relationships Among variables in large datasets from CompLEx systems (MIRACLE)
Principal Investigators: C. Michael Barton, Arizona State University, US; Tatiana Filatova, University of Twente, Netherlands; Terence P. Dawson, University of Dundee, UK; Dawn Cassandra Parker, University of Waterloo, Canada
Abstract: Social scientists have used agent-based models (ABMs) to explore the interaction and feedbacks among social agents and their environments. The bottom-up structure of ABMs enables simulation and investigation of complex systems and their emergent behavior with a high level of detail; however the stochastic nature and potential combinations of parameters of such models create large non-linear multidimensional "big data," which are difficult to analyze using traditional statistical methods. The proposed project seeks to address this challenge by developing algorithms and web-based analysis and visualization tools that provide automated means of discovering complex relationships among variables. The tools will enable modelers to easily manage, analyze, visualize, and compare their output data, and will provide stakeholders, policy makers and the general public with intuitive web interfaces to explore, interact with and provide feedback on otherwise difficult-to-understand models.


Media Contacts
Deborah Wing, NSF, (703) 292-5344,

Program Contacts
Elizabeth Tran, NSF, (703) 292-5338,

The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2017, its budget is $7.5 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives more than 48,000 competitive proposals for funding and makes about 12,000 new funding awards.

mail icon Get News Updates by Email 

Useful NSF Web Sites:
NSF Home Page:
NSF News:
For the News Media:
Science and Engineering Statistics:
Awards Searches: