DNA Motifs
While one can use established lists of motifs to search one's DNA sequence one can also discover them directly. In order to do this one has to derive a consensus sequence or probability matrix. In the case of bacterial proteins for which the binding sites have been determined good places to start are the E. coli DNA-Binding Site Matrices (A.M. McGuire, Harvard University, U.S.A.), and, DBTBS: a database of transcriptional regulation in Bacillus subtilis (University of Torkyo, Japan). The following sites provides one with a training set which can be used to derive a Gibbs screening matrix.
See additional pages on Promoters,
Terminators, and Transcriptional Factors.
Recent
Review
of Different Sequence Motif Finding Algorithms
(Reference: Hashim FA et al. Avicenna J Med Biotechnol. 2019; 11(2):130-148).
RSAT
RSAT
(Regulatory Sequence Analysis Tools) - is a suite of modular tools for
the detection and the analysis of cis-regulatory elements in genome
sequences. Its main applications are (i) motif discovery, including from
genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii)
motif analysis (quality assessment, comparisons and clustering), (iv)
analysis of regulatory variations, (v) comparative genomics.
(Reference: Santana-Garcia W et al. Nucleic Acids Res. 2022. 50(Web Server issue):W670–W676).
This provides links to the following specialized sites:
- RSAT Fungi
- RSAT Prokaryotes
- RSAT Metazoa
- RSAT Protists
- RSAT Plants
You may also want to consider the MEME Suite
Motif Sampler
Motif Sampler
- tries to find over-represented motifs (cis-acting regulatory elements)
in the upstream region of a set of co- regulated genes. This motif
finding algorithm uses Gibbs sampling to find the position probability
matrix that represents the motif. Be sure to "uncheck" the appropriate
box if you don't want the complementary strand included in the analysis.
(Reference: Thijs G et al. 2002. J. Comput. Biol. 9: 447-464).
BaMM
BaMM
offers four tools: (i) de-novo discovery of enriched motifs in a set of
nucleotide sequences, (ii) scanning a set of nucleotide sequences with
motifs to find motif occurrences, (iii) searching with an input motif
for similar motifs in our BaMM database with motifs for >1000
transcription factors, trained from the GTRD ChIP-seq database
(Reference: Kiesel A et al (2018) Nucleic Acids Research, 46: W215–W220)
STAMP
STAMP:
a web tool for exploring DNA-binding motif similarities
(Reference: Mahony S & Benos PV. 2007. Nucl Acids Res. 35: W253–W258).
P2RP
P2RP (Predicted
Prokaryotic Regulatory Proteins) - including transcription factors (TFs)
and two-component systems (TCSs) based upon analysis of DNA or protein
sequences.
(Reference: Barakat M., 2013. BMC Genomics 14: 269)
Kmer Analysis
K-mers are short DNA sequences (a substring of length k) that are used for genome sequence analysis. Applications that use k-mers include genome assembly and alignment.
KmerFinder 3.2
KmerFinder 3.2
– predicts the species of bacteria from pre-assembled, complete or
partial genomes, and short sequence reads. The prediction is based on
the number of co-occurring k-mers (substrings of k nucleotides in DNA
sequence data, in this case 16-mers) between the genomes of reference
bacteria in a database and the genome provided by the user.
(Reference: Hasman H et al. 2013. J Clin Microbiol. 52:139-146)
kpLogo
kpLogo - motifs
of only 1–4 letters can play important roles when present at key
locations within macromolecules. Because existing motif-discovery tools
typically miss these position-specific short motifs, we developed
kpLogo, a probability-based logo tool for integrated detection and
visualization of position-specific ultra-short motifs from a set of
aligned sequences.
(Reference: X. Wu, & D.P. Bartel (2017) Nucleic Acids Res 45 (Issue W1): W534–W538)
KmerKeys
KmerKeys:
a web resource for searching indexed genome assemblies and variants
(Reference: Pavlichin DS et al. (2017) Nucleic Acids Res 50 (Issue W1): W448–W453)
Updated: December, 2025