Objective 4. Bioinformatics in Every Plant Scientist’s Research Tool Box

NPGI has produced, and will continue to produce, enormous amounts of plant genome data, which need to be made accessible to a broad community of scientists in a useable form. A measure of the success of the NPGI will be in the direct usefulness of genome information. Significant and broad efforts should be directed toward programs that enable individuals or groups to access, analyze and compare data. The engineering of information systems, the development of data-mining tools, and the creation of computation based predictive models for functional analysis will continue to advance the goals of the NPGI.

Develop informatics tools to access and use plant genome databases

High throughput genomics technologies have led to a flood of sequence information, gene expression array data, and map data. All databases must have capabilities that will allow the broadest access by the community. Open access will lead to the widest utilization of the data and the development of innovative and more sophisticated tools, which, in turn, will enable individuals or groups to access and query all the available, current and future resources in the most imaginative ways possible.

The plant community should try to utilize existing tools for informatics to the largest extent possible. For example, the Generic Model Organism Project (GMOD) is a joint project between the National Human Genome Research Institute and the Agricultural Research Service, and aims to develop generic software modules for common function of a model organism genome data. Networking with these and other similar efforts will be an efficient and cost-effective way to leverage investments already made.

Build community databases with standards for interoperability

Databases should be developed that incorporate a common set of standards and interfaces in order for individuals located anywhere in the world to make full use of all publicly available resources. One way to make databases become interoperable is through the development of controlled vocabularies. The plant community (e.g., Arabidopsis and maize) is already actively participating in the Gene Ontology (GO) consortium that is developing controlled vocabularies for all model genomes. This effort should be encouraged for any new community and datasets

Institute an internationally coordinated data repository mechanism

Data repository mechanisms and standard operating principles for the reference species Arabidopsis and rice as well as species-specific databases are critical and should be coordinated internationally.

Develop new algorithms to analyze plant genomics data

Computational resources and analytical tools that can mine genomic data and lead to hypothesis testing, validation and application, are essential and should be made freely available. Especially needed at this time is the development of algorithms for comparative genomics and population genetics. These tools will lead to increased knowledge of fundamental plant processes such as photosynthesis, respiration, carbon and nitrogen metabolism, nitrogen fixation, primary and secondary metabolism, polyploidy and domestication, and plant-microbe associations.

Objective 4. Bioinformatics in Every Plant Scientist’s Research Tool Box

NEXT