Trends in Genetics
ReviewDeciphering the transcriptional cis-regulatory code
Section snippets
Regulatory information is encoded in defined DNA sequence elements
During development, a single precursor cell (the fertilized egg) gives rise to a complex multicellular organism comprising a large variety of cell types. This process is heritable and repeats generation after generation according to a developmental program that is encoded in the genome. This program governs the dynamic regulation of gene expression in response to environmental and developmental stimuli and thus determines the differentiation of cell types, their morphologies and functions, and
Understanding the cis-regulatory code
The comparison between protein-coding and regulatory DNA sequences, that is, between the genetic and the regulatory code, is interesting and instructive. Proteins are encoded in open reading frames (ORFs) that are defined by start and stop codons in between which the ungapped sequential occurrence of 61 nucleotide triplets or ‘codons’ determines the linear amino acid sequence of the protein. The genetic code therefore amounts to a simple mapping of all 43 (= 64) possible triplets redundantly to
Enumerating the parts
The classical approach to study the rules that govern enhancer function has been the exhaustive characterization of individual enhancers, such as eve stripe 2 (Figure 1a,b; see [23] and references therein) and sparkling in Drosophila 18, 24 or the interferon-beta enhancer in mammals [25]). These studies established much of what is known today about the genomic properties of enhancers and suggested that the main ‘building blocks’ of the cis-regulatory code are TF motifs; the presence of certain
Building block I: TF motifs
TF motifs capture the DNA sequence preferences of TFs and are therefore typically short and degenerate, which can be represented by IUPAC consensus sequences or more flexibly by position-specific weight matrices (PWMs). TF motifs are found in enhancer sequences and their importance for enhancer function has often been demonstrated by the combination of genetic and biochemical approaches. For example, in-depth analysis identified motifs for five TFs in eve stripe 2, arguably the most
Building block II: TF cooperativity
Interestingly, in vivo TF binding studies using ChIP revealed two surprises: a TF binds to only a small fraction of all motif occurrences in the genome, and the binding sites differ between contexts (i.e., tissue or cell type), suggesting that the TF motif alone is not sufficient to direct in vivo binding (e.g., 5, 56, 57, 58 and references therein).
Some of these differences may be due to the presence of additional factors required for TF binding in vivo. Indeed, the comparison of binding sites
Building block III: additional enhancer sequence features
The detailed analysis of individual enhancers, such as the enhancer sparkling, which drives the expression of shaven [the Drosophila homolog of vertebrate paired box 2 (Pax2)] in cone cells of the developing Drosophila eye, has revealed the complexity of enhancer sequences 18, 24: even after the detailed dissection of all binding sites for relevant TFs, the sequences between these were found to be essential [18]. This suggests that much of what is needed for enhancer activity is still unknown,
Assembling the building blocks: ‘Enhancer Grammar’
Detailed dissections of individual enhancers have also provided insights into the relative arrangements of TF motifs: within the eve stripe 2 enhancer, for example, the precise positioning and orientation of TF binding sites appears to be less important than the combined input of the TFs. Such flexibility has also been observed for other developmental enhancers (e.g., 18, 24, 61) and might be a general property of the regulatory code (Figure 2b; see also below). In sharp contrast to this model
Predicting regulatory function from the building blocks
If gene expression is determined by a regulatory code that can be generalized across different enhancers with similar functions, it should be possible to learn rules (e.g., about the presence of TF motifs or binding sites) from known enhancers (training set) and predict the functionality of previously unseen sequences (test set) (Figure 4a). If truly independent sequences are used during training and testing, such ‘cross-validated’ predictions allow powerful conclusions to be drawn; in addition
Towards a mechanistic and quantitative understanding of enhancers
Several modeling approaches based on thermodynamics or logistic regression have been used to predict expression and to seek a quantitative and mechanistic understanding of enhancer function and gene expression from first principles (reviewed in [76]; Figure 4b). Such approaches are informed by biological and biophysical knowledge, and attempt to model the binding of activators and repressors (TFs) to the DNA, the recruitment of intermediate proteins, such as cofactors, or mediator components,
Universality of the cis-regulatory code
In contrast to the universality of the genetic code, the evolutionary distance between species that are able to correctly interpret the enhancer sequences of each other is more limited and difficult to assess. This is because cell types and their complements of trans-acting regulators (which interpret the cis-regulatory sequences) are often restricted to certain phyla and are themselves subject to evolutionary change (see [83]).
Homologous cis-regulatory sequences have often been found to
The cis-regulatory code and evolution
The modularity of enhancer function and the flexibility and redundancy of the cis-regulatory code, especially in comparison with the genetic code, can explain both its functional robustness and the apparent ease with which sequence changes can alter gene expression. This has important consequences for evolutionary dynamics, and changes in the transcriptional regulation of genes are considered to be one of the major drivers for morphological evolution [14]. In particular, the contribution of
The road ahead: enhancer elements, their genomic context, and enhancer–promoter interactions
Transcriptional enhancers have fascinated and puzzled researchers since their initial discovery more than 30 years ago [94]. Although many characteristics of enhancers have been uncovered since, there is still no comprehensive picture of the necessary sequence features for even the most well-studied enhancers, and the de novo creation of an enhancer from non-functional sequence by the addition of such features has yet to be achieved. Similarly incomplete is the picture of the abundance and
Acknowledgments
We would like to thank Hannes Tkadletz (IMP/IMBA Graphics Department) for help with the figures and the anonymous reviewers for their helpful comments. We apologize to the many scientists the work of whom we could not cite owing to formal restrictions. Our work is supported by a European Research Council (ERC) Starting Grant from the European Community's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 242922 awarded to A.S. and by the Austrian Ministry for Science and
References (120)
- et al.
Developmental gene regulation in the era of genomics
Dev. Biol.
(2010) - et al.
Comparative genomics of gene regulation-conservation and divergence of cis-regulatory information
Curr. Opin. Genet. Dev.
(2009) Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution
Cell
(2008)Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer
Dev. Cell
(2010)Rapid evolutionary rewiring of a structurally constrained eye enhancer
Curr. Biol.
(2011)- et al.
Virus induction of human IFN beta gene expression requires the assembly of an enhanceosome
Cell
(1995) Computational models for neurogenic gene expression in the Drosophila embryo
Curr. Biol.
(2006)- et al.
Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution
Cell
(2011) A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters
Mol. Cell
(2008)Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences
Cell
(2008)
A gene-centered C. elegans protein–DNA interaction network
Cell
Master transcription factors determine cell-type-specific responses to TGF-β signaling
Cell
Lineage regulators direct BMP and Wnt pathways to cell-specific programs during differentiation and regeneration
Cell
Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities
Mol. Cell
A cis-regulatory signature in ascidians and flies, independent of transcription factor binding sites
Curr. Biol.
Predicting gene expression from sequence
Cell
Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences
Cell
HOT regions function as patterned developmental enhancers and have a distinct cis-regulatory signature
Genes Dev.
Mechanisms of transcriptional precision in animal development
Trends Genet.
Species-specific transcription in mice carrying human chromosome 21
Science
Uncovering cis-regulatory sequence requirements for context specific transcription factor binding
Genome Res.
Identification of genetic elements that autonomously determine DNA methylation states
Nat. Genet.
Logic functions of the genomic cis-regulatory code
Proc. Natl. Acad. Sci. U.S.A.
Variation in transcription factor binding among humans
Science
Effects of sequence variation on differential allelic transcription factor occupancy and gene expression
Genome Res.
Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species
PLoS Biol.
High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species
Nat. Genet.
Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding
Science
Functional architecture and evolution of transcriptional elements that drive gene coexpression
Science
Combinatorial binding predicts spatio-temporal cis-regulatory activity
Nature
Functional evolution of a cis-regulatory module
PLoS Biol.
Consequences of eukaryotic enhancer architecture for gene expression dynamics, development, and fitness
PLoS Genet.
The molecular signature and cis-regulatory architecture of a C. elegans gustatory neuron
Genes Dev.
Coordinate enhancers share common organizational features in the Drosophila genome
Proc. Natl. Acad. Sci. U.S.A.
A regulatory code for neurogenic gene expression in the Drosophila embryo
Development
Anterior repression of a Drosophila stripe enhancer requires three position-specific mechanisms
Development
Nucleosome organization in the Drosophila genome
Nature
Local DNA topography correlates with functional noncoding regions of the human genome
Science
Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay
Nat. Biotechnol.
Massively parallel functional dissection of mammalian enhancers in vivo
Nat. Biotechnol.
Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters
Nat. Biotechnol.
The cis-regulatory logic of Hedgehog gradient responses: key roles for gli binding affinity, competition, and cooperativity
Sci. Signal.
Determining the specificity of protein–DNA interactions
Nat. Rev. Genet.
Motif discovery using expectation maximization and Gibbs’ sampling
Methods Mol. Biol.
Practical strategies for discovering regulatory DNA sequence motifs
PLoS Comput. Biol.
Robust target gene discovery through transcriptome perturbations and genome-wide enhancer predictions in Drosophila uncovers a regulatory basis for sensory specification
PLoS Biol.
Systematic identification of mammalian regulatory motifs’ target genes and functions
Nat. Methods
The TAGteam DNA motif controls the timing of Drosophila pre-blastoderm transcription
Development
The zinc-finger protein Zelda is a key activator of the early zygotic genome in Drosophila
Nature
Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm
PLoS Biol.
Cited by (94)
Identification and characterization of CHD4-associated eRNA as a novel modulator of fetal hemoglobin levels in β-thalassemia
2024, Biochemical and Biophysical Research CommunicationsSubcellular spatially resolved gene neighborhood networks in single cells
2023, Cell Reports MethodsMulti-layered transcriptional control of cranial neural crest development
2023, Seminars in Cell and Developmental BiologyRegulating specificity in enhancer–promoter communication
2022, Current Opinion in Cell Biology
- *
These authors contributed equally.