Transcriptional factors are proteins that binds to DNA, typically upstream from and close to the transciption start site of gene, and regulate the expression of gene by activating or inhibiting the transcription machinery.
Transcription factors contain several functional regions:
* Activation domain: region that interacts wtih other parts of the transcription machinery (RNA polymerase or other transcription factors).
* DNA binding domain: amino acids in the protein that recognize specific bases near the start of transcription.
* Nuclear localization domain: region that serves as a signal for the protein to go to the nucleus after being synthesized in the cytoplasm.
* Dimerization domain: Many transcription factors work as dimers (two subunits). For these proteins, a region of the protein facilitates interaction with another subunit.
The figure shows several transcription factors (JUN, FOS, Sp1, and basal factors) that are necessary for transcription of some genes
Computational approaches to this problem have come in two flavors. One class of methods looks for overrepresented motifs in sequences that are believed to contain several binding sites for the same factor (such as promoters of co-regulated genes) . The second class of methods identifies motifs that are significantly conserved in orthologous sequences, e.g., promoters of the same gene in different species. Yet,the prediction of such regulatory elements computationally challenging task.
Eventhough numerous tools available for this task it should be used with cautious.Based on the assessment each tools performs well depends on the dataset.
Transcriptional factor databasesTranscription factors database
ftp://ftp.ncbi.nih.gov/repository/TFD/Eurkaryotic transcriptional factors databse
TRANSFAC -contains data on transcription factors, their experimentelly-proven binding sites, and regulated genes. Its broad compilation of binding sites allows the derivation of positional weight matrices.
http://www.gene-regulation.com/pub/databases.html#transfacPlant transcription factor database
http://plntfdb.bio.uni-potsdam.de/v1.0/PLACE
Database of motifs found in plant cis-acting regulatory DNA elements, all from previously published reports. It covers vascular plants only.
http://www.dna.affrc.go.jp/PLACE/PlantProm DB
Database with annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species.
http://www.softberry.comPlantCare
Database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences.
http://bioinformatics.psb.ugent.be/webtools/plantcare/html/DoOP: Databases of Orthologous Promoters
A database containing orthologous clusters of promoters from Homo sapiens, Arabidopsis thaliana and other organisms.
http://doop.abc.hu/DATF: Database of Arabidopsis Transcription Factors
The Database of Arabidopsis Transcription Factors (DATF) contains known and predicted Arabidopsis transcription factors with sequences and many other features including 3D structure templates, EST expression information, transcription factor binding sites and Nuclear Location Signals.
http://datf.cbi.pku.edu.cn/AtProbe
The Arabidopsis thaliana promoter binding element database, an aid to find binding elements and check data against the primary literature.
http://exon.cshl.org/cgi-bin/atprobe/atprobe.plAthaMap
A genome-wide map of putative transcription factor binding sites in Arabidopsis thaliana.
http://www.athamap.de/AGRIS
contains two databases, AtcisDB (Arabidopsis thaliana cis-regulatory database) and AtTFDB (Arabidopsis thaliana transcription factor database).
http://arabidopsis.med.ohio-state.edu/Prediction toolsWeeder - For all eukaryotic datasets
http://159.149.109.16:8080/weederWeb/index2.htmloligo/dyad analysis & ANN-Spec - for human dataset
http://rsat.scmbb.ulb.ac.be/rsat/SesiMCMC performs better for flydataset
http://favorov.imb.ac.ru/cgi-bin/gibbslfm/gibbslfm.pl?action=formMEME3 & YMF - Performs better for mouse data set
http://meme.sdsc.edu/meme/intro.htmlhttp://wingless.cs.washington.edu/YMF/YMFWeb/YMFInput.plMotif sampler performs better for real experimental dataset
http://homes.esat.kuleuven.be/~thijs/Work/MotifSampler.htmlPhyME - Good for comparative sequence analysis (also known as phylogenetic footprinting)
http://edsc.rockefeller.edu/cgi-bin/phyme/download.plIt is advised to use a few complementary tools in combination rather than relying on a single one.
other tools:
AlignACE:
http://atlas.med.harvard.edu/Consensus:
http://bifrost.wustl.edu/consensusGLAM:
http://zlab.bu.edu/glamMITRA:
http://www.calit2.net.combio/mitralquickscore:
http://aglo.inria.fr/dolley/quickscoreReference:
Assessing computational tools for the discovery of transcription factor binding sites.Nat Biotechnol. 2005 Jan;23(1):137-44.