Skip To Content
NSF Logo Search GraphicGuide To Programs GraphicImage Library GraphicSite Map GraphicHelp GraphicPrivacy Policy Graphic
OLPA Header Graphic

NSF Press Release


Embargoed Until: 1 p.m. Eastern Time
NSF PR 03-55 - May 12, 2003

Media contact:

 David Hart

 (703) 292-7737

Program contact:

 Barbara Fossum

 (703) 292-8962

Pattern Recognition Method Zeroes in on Genes that Regulate Cell's Genetic Machinery

ARLINGTON, Va.—Using a new technique for recognizing patterns in biological databases, a team of U.S. and Israeli computer scientists and geneticists has developed a practical computational method that zeroes in on the genes responsible for controlling the genetic machinery of a cell.

In a paper published online May 12 by Nature Genetics, the researchers from Stanford University and Hebrew University and the Weizmann Institute in Israel report that their method revealed several previously unknown control, or regulatory, genes from Saccharomyces cerevisiae, better known as baker's yeast. The work was supported in part by an Information Technology Research grant from the National Science Foundation, the independent agency that supports basic research in all fields of science and engineering.

Daphne Koller, a computer science professor at Stanford, is leading an effort to develop general models for recognizing meaningful patterns that span many related databases. This unique ability to "mix and match" biological data sources gives the new method its power.

Ordinarily, regulatory genes are identified experimentally, not computationally. The new computational method makes the experimental process much more efficient. It identifies regulatory candidates for testing in the lab and predicts how each regulator will affect cellular activity. The demonstration on the yeast genome data discovered several possible new regulatory genes and the clusters they regulate, and the team has already confirmed three of the predictions in the lab.

The primary data source for the method is gene expression technology, which involves mixing probes for thousands of genes with a biological sample under specific conditions. The probes provide a detailed snapshot, called a microarray, of the genes active in those conditions. A typical experiment would produce microarrays for hundreds of different conditions to see which genes are expressed in each condition.

"Each microarray provides a huge amount of data, and it's very difficult to extract meaningful information from it by eye," Koller said. "Over the past few years, many computational methods have been developed for dealing with this problem." Such methods identify related clusters of a handful or several dozen genes from the resulting data.

The new approach described in Nature Genetics also finds clusters, but it is the first to incorporate data about known and putative regulatory genes and the first to simultaneously predict which gene or genes regulate each cluster.

In response to internal or external signals, regulatory genes tell clusters of genes to turn on or off-in other words, to start or stop making proteins. The proteins from each gene cluster, in turn, are responsible for a different cell process. These processes include converting sugar to energy, responding to stress, folding proteins, and building cellular components such as the nucleus.

Koller's pattern recognition technique builds on statistical models and the widely used technology of relational databases to look for patterns across many different data sources, such as microarray data, DNA sequence data or protein-protein interaction data. The generality of the method lets the researchers assemble data sets like Lego blocks, plugging a new database into the relational structure and letting the algorithm go to work. To make the results of this type of analysis more accessible to biologists, Koller's group has developed the GeneXPress visualization and exploration tool, freely available on the web.

"Knowing the control mechanism for gene clusters is crucial for understanding how cells respond to internal and external signals," said team member David Botstein, a professor of genetics at Stanford. "This new computational method efficiently generates targets for testing and proposes hypotheses about their regulatory roles that can be experimentally confirmed."

Authors of the Nature Genetics paper also include Koller's graduate student Eran Segal; Hebrew University professor Nir Friedman and his graduate student Dana Pe'er; Michael Shapira, a post-doctoral researcher in Botstein's group; and Aviv Regev of the Weizmann Institute, currently a fellow at Harvard University's Bauer Center for Genomics Research. In addition to NSF support, the team members received support from several institutional awards, the Colton Foundation, and the Israeli Ministry of Science.


Koller group:
Saccharomyces Genome Database:
Friedman group:

Principal Investigators: Daphne Koller, 650-723-6598,
David Botstein, 650-723-3488,
Nir Friedman, +972-2-658-4720,

NSF is an independent federal agency that supports fundamental research and education across all fields of science and engineering, with an annual budget of nearly $5 billion. NSF funds reach all 50 states through grants to nearly 2,000 universities and institutions. Each year, NSF receives about 30,000 competitive requests for funding, and makes about 10,000 new funding awards. NSF also awards over $200 million in professional and service contracts yearly.

Receive official NSF news electronically through the e-mail delivery system, NSFnews. To subscribe, send an e-mail message to In the body of the message, type "subscribe nsfnews" and then type your name. (Ex.: "subscribe nsfnews John Smith")

Useful Web Sites:
NSF Home Page:
News Highlights:
Science Statistics:
Awards Searches:



National Science Foundation
Office of Legislative and Public Affairs
4201 Wilson Boulevard
Arlington, Virginia 22230, USA
Tel: 703-292-8070
FIRS: 800-877-8339 | TDD: 703-292-5090

NSF Logo Graphic