rVista for Comparative Sequence-Based Discovery of Functional Transcription Factor Binding Sites

  1. Gabriela G. Loots1,4,
  2. Ivan Ovcharenko1,
  3. Lior Pachter2,
  4. Inna Dubchak1,3,4, and
  5. Edward M. Rubin1
  1. 1Genome Sciences Department, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; 2Department of Mathematics, University of California at Berkeley, Berkeley, California 94720, USA; 3National Energy Research Supercomputing Center, Lawrence Berkeley National Laboratory, California 94720, USA.

Abstract

Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVISTA, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) and the analysis of interspecies sequence conservation to maximize the identification of functional sites. To assess the ability of rVISTA to discover true positive TFBSs while minimizing the prediction of false positives, we analyzed the distribution of several TFBSs across 1 Mb of the well-annotated cytokine gene cluster (Hs5q31; Mm11). Because a large number of AP-1, NFAT, and GATA-3 sites have been experimentally identified in this interval, we focused our analysis on the distribution of all binding sites specific for these transcription factors. The exploitation of the orthologous human–mouse dataset resulted in the elimination of >95% of the ∼58,000 binding sites predicted on analysis of the human sequence alone, whereas it identified 88% of the experimentally verified binding sites in this region.

Footnotes

  • 4 Corresponding authors.

  • E-MAIL ggloots{at}lbl.gov; ildubchak{at}lbl.gov; FAX (510) 486-6746.

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.225502. Article published online before print in April 2002.

    • Received November 27, 2001.
    • Accepted March 7, 2002.
| Table of Contents

Preprint Server