While one can use established lists of motifs to search one's DNA sequence one can also discover them directly. In order to do this one has to derive a consensus sequence or probability matrix.  In the case of bacterial proteins for which the binding sites have been determined good places to start are the  E. coli DNA-Binding Site Matrices (A.M. McGuire, Harvard University, U.S.A.), and,   DBTBS: a database of transcriptional regulation in Bacillus subtilis (University of Torkyo, Japan). The following sites provides one with a training set which can be used to derive a Gibbs screening matrix.

See additional pages on Promoters, Terminators, and Transcriptional Factors.

An assessment of a set of motif identifiers can be found in Nature Biotechnology, 2005, 23(1):137-144.

Gibbs Motif Sampler Homepage (E.C. Rouchka and B. Thompson, Bioinformatics Laboratory of  Wadsworth Center, U.S.A.) - I have linked to the prokaryotic DNA default setting page. On the next page I have presented data the IHF-binding site (consensus: WWWTCAA[N4]TTR).

RSA-tools - info-Gibbs (A. Neuwald & Jacques van Helden, Service de Conformation des Macromolécules Biologiques et de Bioinformatique, Université Libre de Bruxelles, Belgium) - type in the matrix size desired and deselect "add reverse complement strand."  After running the program once I would delete those sequences from the discovery set which align imperfectly.

Create Matrix File (J. Zheng, Queen's University, Canada) - creates a matrix from a DNA Clustal alignment and also presents the consensus:

Number of sequences: 11
Length of alignment: 29
Consensus sequence representing: 80% matching base(s)

A 0  9 11 10 0 4 1 1 2 1 0 1 2 5 1 2 2 2 1 2 1  0  0  11 0  10 8 3 4 
C 0  1 0  0  2 0 1 2 2 5 7 5 3 1 4 4 6 3 1 0 10 11 0  0  1  0  0 4 3 
G 0  0 0  1  8 1 0 0 3 3 1 2 4 2 1 2 3 3 3 9 0  0  11 0  0  1  1 0 2 
T 11 1 0  0  1 6 9 8 4 2 3 3 2 3 5 3 0 3 6 0 0  0  0  0  10 0  2 4 2 

  T  A A  A  S W T Y D B Y B V D Y H S B K G C  C  G  A  T  A  W H V

DNA Motifs Gibbs Sampler - SeSiMCMC - the Sequence Similarities by Markov Chain Monte-Carlo algorithm finds DNA motifs of unknown length and complicated structure in a set of unaligned DNA sequences. It uses an improved motif length estimator and careful Bayesian analysis of the possibility of a site absence in a sequence. Reference: A.V. Favorov et al.. 2005.  Bioinformatics 21: 2240-2245.

You may also want to consider the MEME Suite

FindTerm (Softberry Inc.) - only two tools exist on the internet for mapping rho-independent terminators FindTerm and TransTerm. You might consider using the advanced feature options and minimally increase the default energy threshold to -12.0.

Tools to find motif clusters in DNA sequences - one should probably start at ZLAB (Dr. Zhiping Weng, Boston University, U.S.A) which has developed a  wide range of tools to interaction between regulatory proteins and their DNA/RNA target sites including:


Find short split motifs in DNA sequences with YMF (Reference: Sinha, S. & Tompa, M. 2002. Nucl.Acids Res.)
Motif Sampler - tries to find over-represented motifs (cis-acting regulatory elements) in the upstream region of a set of co- regulated genes. This motif finding algorithm uses Gibbs sampling to find the position probability matrix that represents the motif. Be sure to "uncheck" the appropriate box if you don't want the complementary strand included in the analysis. (Reference: G. Thijs et al. 2002. J. Comput. Biol. 9: 447-464.)

 extractUpStreamDNA (A. Villegas, Public Health Ontario) - takes a Genbank flatfile (*.gbk) as input and parses through and for every CDS that it finds, it extracts a pre-determined length of DNA upstream (length will be an argument; and will include 3 nt for the initiation codon). Output will be an FFN file of these upstream DNA sequences.  N.B. this only WORKS for prokaryotic sequences because it does not handle Splits or Joins found in eukaryotic.  This data then can be analyzed with pprograms such as MEME.

MelinaII - Motif Elucidator in Nucleotide Sequence Assembly (Human Genome Center, University of Tokyo, Japan) - helps one extract a set of common motifs shared by functionally-related DNA sequences. It  utilizes CONSENSUS, GIBBS DNA, MEME and Coresearch  which are considered to be the most progressive motif search algorithms. Each algorithms is supplied with an impressive set of selection parameters. 

red_bullet.gif (914 bytes) SCOPE (Suite for Computational identification Of Promoter Elements), an ensemble of programs aimed at identifying novel cis-regulatory elements from groups of upstream sequences. (Reference: J.M. Carlson et al. 2007. Nucl. Acids Res. 35: W259-W264)

 P2RP (Predicted Prokaryotic Regulatory Proteins) - including transcription factors (TFs) and two-component systems (TCSs) based upon analysis of DNA or protein sequences. (Reference: Barakat M., 2013. BMC Genomics 14: 269)

red_bullet.gif (914 bytes) DMINDA - This server provides a suite of cis-regulatory motif analysis functions for DNA sequences. (Reference: Q.Ma et al.  2014. Nucleic Acids Res. 42(Web Server issue):W12-9.)

red_bullet.gif (914 bytes) RegRNA 2.0 is an integrated web server for identifying functional RNA motifs in an input RNA sequence.  These include Splicing sites (donor site; acceptor site); Splicing regulatory motifs(ESE; ESS; ISE; ISS elements); Polyadenylation sites; Transcriptional motifs (rho-independent terminator; TRANSFAC); Translational motifs (ribosome binding sites); UTR motifs (UTRsite patterns); mRNA degradation elements (AU-rich elements); RNA editing sites (C-to-U editing sites); Riboswitches (RiboSW); RNA cis-regulatory elements (Rfam; ERPIN); Similar functional RNA sequences (fRNAdb); RNA-RNA interaction regions (miRNA; ncRNA). (Reference: Chang TH et al. 2013. BMC bioinformatics 14 Suppl 2:S4).

red_bullet.gif (914 bytes) RegRNA - A Regulatory RNA Motifs and Elements Finder - RegRNA is an integrated web server for identifying the homologs of regulatory RNA motifs and elements against an input mRNA sequence. Both sequence homologs and structural homologs of regulatory RNA motifs can be recognized. The regulatory RNA motifs supported in RegRNA are categorized into several classes: (i) motifs in mRNA 5'-untranslated region (5'-UTR) and 3'-UTR; (ii) motifs involved in mRNA splicing; (iii) motifs involved in transcriptional regulation; (iv) riboswitches; (v) splicing donor/acceptor sites; (vi) inverted repeats; and (vii) miRNA target sites.(Reference: Huang HY et al. 2006. Nucleic Acids Res. 34(Web Server issue):W429-34).

red_bullet.gif (914 bytes) Signature (Institute of Bioinformatics, University of Georgia, U.S.A.)- find under- and over-represented short oligonucleotides (di-, tri- and tetranucleotides) in a genome sequence

red_bullet.gif (914 bytes) AIMIE Ab Initio Motif Identification Environment - this tool should be useful for picking up high-copy dispersed repeats, such as repeated extragenic palindrome (REP) elements, CRISPR repeats, uptake signal sequences (DUS/USS), intergenic dyad sequences and several other over-represented sequence motifs  in genome sequences.  (Reference: Mrázek, J. et al. 2008. Bioinformatics 24: 1041-1048).

red_bullet.gif (914 bytes) fwDNA (Institute of Bioinformatics, University of Georgia, U.S.A.) - Find Frequent Words (oligonucleotides) in a genome sequence

red_bullet.gif (914 bytes) ASEQH Analysis of sequence heterogeneity (Institute of Bioinformatics, University of Georgia, U.S.A.) - sliding window plots which allows users to generate sliding window plots of seven different sequence properties:  G + C content; S3 : G + C at codon site 3; d* - differences with respect to genomic average; synonymous codon bias with respect to genomic average; amino acid composition differences with respect to genomic average; (G - C) / (G + C) : G-C skew (A - T) / (A + T) : A-T skew. It is intended for analysis of prokaryotic genomes but it can be applied to eukaryotic chromosomes with some limitations. 

red_bullet.gif (914 bytes) PATLOC (Pattern Locator) (Institute of Bioinformatics, University of Georgia, U.S.A.) - is a new tool for finding sequence patterns in long DNA sequences. For this web-based service, a restricted version of Pattern Locator is used, which estimates the time needed for completion of the search and stops if the estimated CPU time exceeds a certain limit (currently 90 seconds). The CPU time limit was introduced in order to protect the web server from overloading due to requests involving too complex sequence patterns.  If you want to search for Sigma-70 (RpoD)-like promoters the pattern syntax for your search is:  <>{TTGACA(N)[15:18]TATAAT}[4].  N.B. the [4] allows for 4 mismatches - I recommend a maximum of two.  If you only want one strand screened omit the <> at the start. You can restrict the search to intergenic regions (but this will eliminate also matches that partially overlap with genes or use the .patvic.txt output file to find where they are (Jan Mrázek, personal communication).