Trends in Genetics
Volume 29, Issue 1, January 2013, Pages 11-22
Journal home page for Trends in Genetics

Review
Deciphering the transcriptional cis-regulatory code

https://doi.org/10.1016/j.tig.2012.09.007Get rights and content

Information about developmental gene expression resides in defined regulatory elements, called enhancers, in the non-coding part of the genome. Although cells reliably utilize enhancers to orchestrate gene expression, a cis-regulatory code that would allow their interpretation has remained one of the greatest challenges of modern biology. In this review, we summarize studies from the past three decades that describe progress towards revealing the properties of enhancers and discuss how recent approaches are providing unprecedented insights into regulatory elements in animal genomes. Over the next years, we believe that the functional characterization of regulatory sequences in entire genomes, combined with recent computational methods, will provide a comprehensive view of genomic regulatory elements and their building blocks and will enable researchers to begin to understand the sequence basis of the cis-regulatory code.

Section snippets

Regulatory information is encoded in defined DNA sequence elements

During development, a single precursor cell (the fertilized egg) gives rise to a complex multicellular organism comprising a large variety of cell types. This process is heritable and repeats generation after generation according to a developmental program that is encoded in the genome. This program governs the dynamic regulation of gene expression in response to environmental and developmental stimuli and thus determines the differentiation of cell types, their morphologies and functions, and

Understanding the cis-regulatory code

The comparison between protein-coding and regulatory DNA sequences, that is, between the genetic and the regulatory code, is interesting and instructive. Proteins are encoded in open reading frames (ORFs) that are defined by start and stop codons in between which the ungapped sequential occurrence of 61 nucleotide triplets or ‘codons’ determines the linear amino acid sequence of the protein. The genetic code therefore amounts to a simple mapping of all 43 (= 64) possible triplets redundantly to

Enumerating the parts

The classical approach to study the rules that govern enhancer function has been the exhaustive characterization of individual enhancers, such as eve stripe 2 (Figure 1a,b; see [23] and references therein) and sparkling in Drosophila 18, 24 or the interferon-beta enhancer in mammals [25]). These studies established much of what is known today about the genomic properties of enhancers and suggested that the main ‘building blocks’ of the cis-regulatory code are TF motifs; the presence of certain

Building block I: TF motifs

TF motifs capture the DNA sequence preferences of TFs and are therefore typically short and degenerate, which can be represented by IUPAC consensus sequences or more flexibly by position-specific weight matrices (PWMs). TF motifs are found in enhancer sequences and their importance for enhancer function has often been demonstrated by the combination of genetic and biochemical approaches. For example, in-depth analysis identified motifs for five TFs in eve stripe 2, arguably the most

Building block II: TF cooperativity

Interestingly, in vivo TF binding studies using ChIP revealed two surprises: a TF binds to only a small fraction of all motif occurrences in the genome, and the binding sites differ between contexts (i.e., tissue or cell type), suggesting that the TF motif alone is not sufficient to direct in vivo binding (e.g., 5, 56, 57, 58 and references therein).

Some of these differences may be due to the presence of additional factors required for TF binding in vivo. Indeed, the comparison of binding sites

Building block III: additional enhancer sequence features

The detailed analysis of individual enhancers, such as the enhancer sparkling, which drives the expression of shaven [the Drosophila homolog of vertebrate paired box 2 (Pax2)] in cone cells of the developing Drosophila eye, has revealed the complexity of enhancer sequences 18, 24: even after the detailed dissection of all binding sites for relevant TFs, the sequences between these were found to be essential [18]. This suggests that much of what is needed for enhancer activity is still unknown,

Assembling the building blocks: ‘Enhancer Grammar’

Detailed dissections of individual enhancers have also provided insights into the relative arrangements of TF motifs: within the eve stripe 2 enhancer, for example, the precise positioning and orientation of TF binding sites appears to be less important than the combined input of the TFs. Such flexibility has also been observed for other developmental enhancers (e.g., 18, 24, 61) and might be a general property of the regulatory code (Figure 2b; see also below). In sharp contrast to this model

Predicting regulatory function from the building blocks

If gene expression is determined by a regulatory code that can be generalized across different enhancers with similar functions, it should be possible to learn rules (e.g., about the presence of TF motifs or binding sites) from known enhancers (training set) and predict the functionality of previously unseen sequences (test set) (Figure 4a). If truly independent sequences are used during training and testing, such ‘cross-validated’ predictions allow powerful conclusions to be drawn; in addition

Towards a mechanistic and quantitative understanding of enhancers

Several modeling approaches based on thermodynamics or logistic regression have been used to predict expression and to seek a quantitative and mechanistic understanding of enhancer function and gene expression from first principles (reviewed in [76]; Figure 4b). Such approaches are informed by biological and biophysical knowledge, and attempt to model the binding of activators and repressors (TFs) to the DNA, the recruitment of intermediate proteins, such as cofactors, or mediator components,

Universality of the cis-regulatory code

In contrast to the universality of the genetic code, the evolutionary distance between species that are able to correctly interpret the enhancer sequences of each other is more limited and difficult to assess. This is because cell types and their complements of trans-acting regulators (which interpret the cis-regulatory sequences) are often restricted to certain phyla and are themselves subject to evolutionary change (see [83]).

Homologous cis-regulatory sequences have often been found to

The cis-regulatory code and evolution

The modularity of enhancer function and the flexibility and redundancy of the cis-regulatory code, especially in comparison with the genetic code, can explain both its functional robustness and the apparent ease with which sequence changes can alter gene expression. This has important consequences for evolutionary dynamics, and changes in the transcriptional regulation of genes are considered to be one of the major drivers for morphological evolution [14]. In particular, the contribution of

The road ahead: enhancer elements, their genomic context, and enhancer–promoter interactions

Transcriptional enhancers have fascinated and puzzled researchers since their initial discovery more than 30 years ago [94]. Although many characteristics of enhancers have been uncovered since, there is still no comprehensive picture of the necessary sequence features for even the most well-studied enhancers, and the de novo creation of an enhancer from non-functional sequence by the addition of such features has yet to be achieved. Similarly incomplete is the picture of the abundance and

Acknowledgments

We would like to thank Hannes Tkadletz (IMP/IMBA Graphics Department) for help with the figures and the anonymous reviewers for their helpful comments. We apologize to the many scientists the work of whom we could not cite owing to formal restrictions. Our work is supported by a European Research Council (ERC) Starting Grant from the European Community's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 242922 awarded to A.S. and by the Austrian Ministry for Science and

References (120)

  • B. Deplancke

    A gene-centered C. elegans protein–DNA interaction network

    Cell

    (2006)
  • A.C. Mullen

    Master transcription factors determine cell-type-specific responses to TGF-β signaling

    Cell

    (2011)
  • E. Trompouki

    Lineage regulators direct BMP and Wnt pathways to cell-specific programs during differentiation and regeneration

    Cell

    (2011)
  • S. Heinz

    Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities

    Mol. Cell

    (2010)
  • P. Khoueiry

    A cis-regulatory signature in ascidians and flies, independent of transcription factor binding sites

    Curr. Biol.

    (2010)
  • M.A. Beer et al.

    Predicting gene expression from sequence

    Cell

    (2004)
  • J. Banerji

    Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences

    Cell

    (1981)
  • E.Z. Kvon

    HOT regions function as patterned developmental enhancers and have a distinct cis-regulatory signature

    Genes Dev.

    (2012)
  • M. Lagha

    Mechanisms of transcriptional precision in animal development

    Trends Genet.

    (2012)
  • M.D. Wilson

    Species-specific transcription in mice carrying human chromosome 21

    Science

    (2008)
  • J.O. Yanez-Cuna

    Uncovering cis-regulatory sequence requirements for context specific transcription factor binding

    Genome Res.

    (2012)
  • F. Lienert

    Identification of genetic elements that autonomously determine DNA methylation states

    Nat. Genet.

    (2011)
  • S. Istrail et al.

    Logic functions of the genomic cis-regulatory code

    Proc. Natl. Acad. Sci. U.S.A.

    (2005)
  • M. Kasowski

    Variation in transcription factor binding among humans

    Science

    (2010)
  • T.E. Reddy

    Effects of sequence variation on differential allelic transcription factor occupancy and gene expression

    Genome Res.

    (2012)
  • R.K. Bradley

    Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species

    PLoS Biol.

    (2010)
  • Q. He

    High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species

    Nat. Genet.

    (2011)
  • D. Schmidt

    Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding

    Science

    (2010)
  • C.D. Brown

    Functional architecture and evolution of transcriptional elements that drive gene coexpression

    Science

    (2007)
  • R.P. Zinzen

    Combinatorial binding predicts spatio-temporal cis-regulatory activity

    Nature

    (2009)
  • M.Z. Ludwig

    Functional evolution of a cis-regulatory module

    PLoS Biol.

    (2005)
  • M.Z. Ludwig

    Consequences of eukaryotic enhancer architecture for gene expression dynamics, development, and fitness

    PLoS Genet.

    (2011)
  • J.F. Etchberger

    The molecular signature and cis-regulatory architecture of a C. elegans gustatory neuron

    Genes Dev.

    (2007)
  • A. Erives et al.

    Coordinate enhancers share common organizational features in the Drosophila genome

    Proc. Natl. Acad. Sci. U.S.A.

    (2004)
  • M. Markstein

    A regulatory code for neurogenic gene expression in the Drosophila embryo

    Development

    (2004)
  • L.P.M. Andrioli

    Anterior repression of a Drosophila stripe enhancer requires three position-specific mechanisms

    Development

    (2002)
  • T.N. Mavrich

    Nucleosome organization in the Drosophila genome

    Nature

    (2008)
  • S.C.J. Parker

    Local DNA topography correlates with functional noncoding regions of the human genome

    Science

    (2009)
  • A. Melnikov

    Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay

    Nat. Biotechnol.

    (2012)
  • R.P. Patwardhan

    Massively parallel functional dissection of mammalian enhancers in vivo

    Nat. Biotechnol.

    (2012)
  • E. Sharon

    Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters

    Nat. Biotechnol.

    (2012)
  • D.S. Parker

    The cis-regulatory logic of Hedgehog gradient responses: key roles for gli binding affinity, competition, and cooperativity

    Sci. Signal.

    (2011)
  • G.D. Stormo et al.

    Determining the specificity of protein–DNA interactions

    Nat. Rev. Genet.

    (2010)
  • G.D. Stormo

    Motif discovery using expectation maximization and Gibbs’ sampling

    Methods Mol. Biol.

    (2010)
  • K.D. MacIsaac et al.

    Practical strategies for discovering regulatory DNA sequence motifs

    PLoS Comput. Biol.

    (2006)
  • S. Aerts

    Robust target gene discovery through transcriptome perturbations and genome-wide enhancer predictions in Drosophila uncovers a regulatory basis for sensory specification

    PLoS Biol.

    (2010)
  • J.B. Warner

    Systematic identification of mammalian regulatory motifs’ target genes and functions

    Nat. Methods

    (2008)
  • J.R. ten Bosch

    The TAGteam DNA motif controls the timing of Drosophila pre-blastoderm transcription

    Development

    (2006)
  • H-L. Liang

    The zinc-finger protein Zelda is a key activator of the early zygotic genome in Drosophila

    Nature

    (2008)
  • X-Y. Li

    Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm

    PLoS Biol.

    (2008)
  • Cited by (94)

    View all citing articles on Scopus
    *

    These authors contributed equally.

    View full text