Bringing Together Heterogenous Scientific Data
A recently launched computer
network aims to bring together information from thousands of ecological and
environmental scientists spread across the nation and around the world,
facilitated by a special software infrastructure to help overcome problems
associated with heterogeneous data sources.
The Knowledge Network for Biocomplexity (KNB) enables the
efficient discovery, access, interpretation, integration, and analysis of
complex ecological data from a highly distributed set of field stations,
laboratories, research sites, and individual researchers. Development of the
KNB was supported by a grant from the KDI (Knowledge and Distributed
Intelligence) program of the National Science Foundation.
The KNB (http://knb.ecoinformatics.org)
involves a national network of federated institutions that have agreed to
share data and metadata using a common framework. That framework principally
revolves around the use of the Ecological Metadata Language as a common
language for describing ecological data and the Metacat metadata server, a
flexible database based on XML and built for storing a wide variety of metadata
In addition, the network is using software called the
Storage Resource Broker, a distributed data system developed at the San Diego
Supercomputer Center, for linking the highly distributed set of ecological
field stations and universities housing ecological data.
A scientist involved with KNB, Matthew B. Jones of the
National Center for Ecological Analysis and Synthesis (NCEAS) at the University
of California at Santa Barbara, says, "What we've done is build an
infrastructure that allows you to decide in metadata what the different data
sources are. And then we built a query and data management system on top of
that metadata. It works quite well in handling pretty much arbitrary data
types. In other words, the database system does not need to know about the
schema and details of the data sources in order to query them."
Jones says that KNB's user authentication system so far has
more than 5,000 users. Participating in the effort are 24 LTER (Long-Term
Ecological Research) sites that collect data around country. Also agreeing to
participate are about 180 sites in the Organization of Biological Field
Stations. According to Jones, "Building the infrastructure is a very different
thing from having knowledge of and participation in the network. There are a
lot of scientists out there, and it's hard to necessarily convince them that
it's worth their time and effort."
Back to Top of Page