Objective 1. Continued Elucidation of Genome Structure and Organization
In the past five years, important guiding concepts have emerged from sequencing projects involving
plants, animals and microbes. One concept has been the importance of focusing on “reference genomes”.
The finished yeast, worm and Arabidopsis genomes represent references that serve as assembly templates
for subsequent draft sequences of genomes from related organisms. Another important concept is the use
of comparative sequence analysis. While a single genome sequence is useful, its
dramatically as additional sequences become available. Comparative genomics uses these resources to
extensively mine for all of the genes and regulatory sequences in a genome. For example, the sequencing of
two mammalian genomes (human and mouse) has revealed new genes that were not found by sequencing
one genome. Comparative genomics research will increase our understanding, at a
sequence level, of the
events that gave rise to new species or to the emergence of specific traits. The sequence resources
developed over the next five years can be used to describe the structure of
individual genomes and also to clarify the dynamic processes that shape genomes.
Contribute to the international effort to finish the rice genome sequence
A deep and highly accurate draft rice sequence was completed in December
2002, representing the first publicly available and most complete rice sequence
information to date. However, additional sequencing to close remaining gaps
will be required to finish this genome and facilitate its use as a reference
genome for cereals. A complete, finished rice sequence will be an essential tool for the broader plant
research community, both basic and applied.
Complete sequencing of the gene-rich regions of the maize genome
Like many of the cereals, maize has a large, complex genome, consisting of about 2,800 million base pairs
(Mbp) of DNA, about the same size as the human genome and 21 times larger than the Arabidopsis genome.
The maize genome organization is complex with more than 80% of the genomic DNA consisting of
repetitive sequences and only about 15%, or about 300 Mbp, encoding genes. While it was not realistic to
contemplate sequencing a genome of this size and complexity in 1998, it is now
possible to sequence the gene-rich regions of the maize genome. Technical challenges are being surmounted by developing efficient
methods to enrich for genes prior to sequencing and then assembling and mapping the sequences onto the
existing master maize genome map. The technologies being developed for sequencing the maize
genome can then be applied to sequence large and complex genomes, not just plant genomes. A complete sequence of the gene-rich regions of the maize genome would augment
available genomic tools to address fundamental questions about gene function, evolution, development and physiology
across all the cereals.
Detailed Genome analysis of a few key plant species
At the present time, it is still prohibitively expensive to sequence all plant genomes since many are large and complex
(Table 1). The current cost is approximately $0.09 per base pair. At this price, a
finished (99.99% accuracy) sequence for wheat would cost $1.44 billion. The National Human
Genome Research Institute intends to develop sequencing technology
in the next decade that will produce complete genome sequences for $1,000 each. Until then, the most efficient use of
NPGI resources will be to develop a set of draft sequences for the gene-rich regions of key plant species,
building on the concept derived from the first five years that reference
genomes are essential genomics tools.
Criteria for selection of plant species for sequencing will minimally include the
following considerations: (1) Experimental tractability; (2) Complexity of genome
structure; (3) Potential for serving as a reference; and (4) Usefulness of
the sequence information to advance plant science.
|Size of sample plant genomes
(M base pair)
Genome analysis resources for a broad spectrum of plants of biological and economic importance
The majority of plants will not be candidates for detailed genome analysis in the next five years. In these
cases, research needs can be met by the development of deep genetic and physical maps, Expressed
Sequence Tags (ESTs) and Bacterial Artificial Chromosome (BAC) libraries. BAC libraries are relatively
inexpensive to construct and are useful to many researchers who work on unique plant systems and to all
researchers for comparative genomics research. A recently developed process called “Targeted
Comparative Sequencing” uses BAC libraries as a promising tool to
provide insight into genome evolution. ESTs prepared for unique cell types or plants grown under specific conditions are especially useful to
identify networks of genes involved in specialized plant processes such as production of secondary
metabolites or responses to specific stimuli.
Understanding the structural basis for plant genome organization
Plants are well suited for studying the structural basis of complex genome organization. Genome
organization contains a record of the evolutionary history of the plant. Thus, comparison of select
examples can reveal the processes that led to the current structure and organization of plant genomes. In
the next few years, additional genome sequences, EST sequences, and other structural genomics resources
will become available. These resources will make it possible to generate detailed, comparative maps for
finding all genes and regulatory sequences, and studying genome evolution across a broad range of
plants. Comparative studies will increase our understanding of the relationships between genome
structure and organization and allow us to begin to ask major unanswered questions in plant
sciences, such as:
Impact of domestication on genome structure and vice
versa: Plant genomes, especially those of cultivated plants, are often radically different from other eukaryotic genomes, both in structure and in
organization. It is likely that many of these differences reflect the strong selection applied during
domestication over thousands of years. During domestication, whole genome duplication, segmental
genome duplication or loss, and genome rearrangements have occurred in a number of crop plants.
Understanding the basic biology of the domestication process will help researchers develop rational
strategies for future crop improvement.
Role of subgenomes in
allopolyploids: Many plants are hybrids of two or more progenitor plants, called
“allopolyploids”. For example, bread wheat (Triticum
aestivum) contains three ancestral genomes termed A, B and D. The D genome, is derived from
Aegilops tauschii, and contains genes for bread quality. Having
sequence information of these genomes would provide scientists with the tools to understand how diverse
genomes combine to generate new plant species.