IUB (Degenerate Bases) Code Table

IUB Code
























red_bullet.gif (914 bytes) VecScreen (National Center for Biotechnology Information) - screens your DNA sequence for potential vector sequence.  Well worth running before doing any other analysis.

red_bullet.gif (914 bytes) Base composition - consider WORDCOUNT  (EMBOSS Suite) which gives one the option of choosing the "word size", and GEMS (Genomatix, Germany).  The latter provides a nice output of mono-, di- and trinucleotide frequencies. Select "create statistics" and "start task" to get to the sequence entry page.

red_bullet.gif (914 bytes) Genomics %G~C Content Calculator (Science - simple calculator for mol%G+C plus counts the individual bases. 

red_bullet.gif (914 bytes) Compositional heterogeneity - Graphe:ADN riche en: (Atelier BioInformatique l'Université de Provence, France) N.B. In French but obvious (Soumettre = Submit). Presents in graphic format AT, GC or single base enrichment in the sequence. A simpler version is GC Content Plot Online.
red_bullet.gif (914 bytes) GraphDNA - DNA Skew Graphing (Viral Bioinformatics Resource Center, University of Victoria, Canada) - this Java applet performs DNA walks, purine, AT and GC skews on small (<1 Mb) genomes. Requires registration and login. Alternative locations for cumulative GC skew are the GC  Skewing (Davidson College, U.S.A.), and GenSkew: Genomic nucleotide skew application (Developed by TU Munich; maintained by Department of Computational Systems Biology of the University of Vienna, Austria)

red_bullet.gif (914 bytes) GC Content Calculator  (Biologics International Corp, Indianapolis, USA - DNA GC-content percentage is calculated as Count(G + C)/Count(A + T + G + C) * 100. This program was used to generate the following disgram of Escherichia phage lambda (NC_001416) using a window of 48 bp. One can click on the peaks and valleys and get a read-out of the localized GC-content.

red_bullet.gif (914 bytes) JaMBW (European Molecular Biology Laboratory of Heidelberg, Germany). Java based Molecular Biologist's Workbench.Select Chapter 1 for sequence format conversion (upper lower case; T  U; reverse or complement sequence).  N.B. Also check out   Chapter 5 "Buffer Calculator."  

red_bullet.gif (914 bytes) DSHIFT - a web server for predicting DNA 1H, 13C & 31P chemical shifts (Reference: S.L. Lam. 2007. Nucl. Acids Res. 35(Web Server issue): W713-W717)

red_bullet.gif (914 bytes) Random DNA sequence generator (Reference: Villesen, P. 2007.  Molecular Ecology Notes 7: 965–968.).  Similar resources are available here and here.
red_bullet.gif (914 bytes) GenRGenS, a software dedicated to random generation of genomics sequences that supports several classes of models, including Markov chains, HMM, context-freegrammars, PROSITE patterns and more. (Reference: Y. Ponty et al. Bioinformatics, 22:1534-1535).

red_bullet.gif (914 bytes) Signature (Institute of Bioinformatics, University of Georgia, U.S.A.) - find under- and over-represented short oligonucleotides (di-, tri- and tetranucleotides) in a genome sequence

red_bullet.gif (914 bytes) AIMIE Ab Initio Motif Identification Environment - this tool should be useful for picking up high-copy dispersed repeats, such as repeated extragenic palindrome (REP) elements, CRISPR repeats, uptake signal sequences (DUS/USS), intergenic dyad sequences and several other over-represented sequence motifs  in genome sequences.  (Reference: Mrázek, J. et al. 2008. Bioinformatics 24: 1041-1048).

red_bullet.gif (914 bytes) fwDNA (Institute of Bioinformatics, University of Georgia, U.S.A.) - Find Frequent Words (oligonucleotides) in a genome sequence

red_bullet.gif (914 bytes) ASEQH Analysis of sequence heterogeneity (Institute of Bioinformatics, University of Georgia, U.S.A.) - sliding window plots which allows users to generate sliding window plots of seven different sequence properties:  G + C content; S3 : G + C at codon site 3; d* - differences with respect to genomic average; synonymous codon bias with respect to genomic average; amino acid composition differences with respect to genomic average; (G - C) / (G + C) : G-C skew (A - T) / (A + T) : A-T skew. It is intended for analysis of prokaryotic genomes but it can be applied to eukaryotic chromosomes with some limitations. 

red_bullet.gif (914 bytes) PATLOC (Pattern Locator) (Institute of Bioinformatics, University of Georgia, U.S.A.) - is a new tool for finding sequence patterns in long DNA sequences. For this web-based service, a restricted version of Pattern Locator is used, which estimates the time needed for completion of the search and stops if the estimated CPU time exceeds a certain limit (currently 90 seconds). The CPU time limit was introduced in order to protect the web server from overloading due to requests involving too complex sequence patterns.  If you want to search for Sigma-70 (RpoD)-like promoters the pattern syntax for your search is:  <>{TTGACA(N)[15:18]TATAAT}[4].  N.B. the [4] allows for 4 mismatches - I recommend a maximum of two.  If you only want one strand screened omit the <> at the start. You can restrict the search to intergenic regions (but this will eliminate also matches that partially overlap with genes or use the .patvic.txt output file to find where they are (Jan Mrázek, personal communication).