COMPOSITION

IUB (Degenerate Bases) Code Table

IUB Code

N

V

B

H

D

K

S

W

M

Y

R

Bases

A,C,G,T

G,A,C

G,T,C

A,T,C

G,A,T

G,T

G,C

A,T

A,C

C,T

A,G

red_bullet.gif (914 bytes) VecScreen (National Center for Biotechnology Information) - screens your DNA sequence for potential vector sequence.  Well worth running before doing any other analysis.

red_bullet.gif (914 bytes) Base composition - consider WORDCOUNT (Pasteur Institute, France) which gives one the option of choosing the "word size", and GEMS (Genomatix, Germany).  The latter provides a nice output of mono-, di- and trinucleotide frequencies. Select "create statistics" and "start task" to get to the sequence entry page.

red_bullet.gif (914 bytes) Genomics %G~C Content Calculator (Science Buddies.org) - simple calculator for mol%G+C plus counts the individual bases. 

red_bullet.gif (914 bytes) YMF 3.0 -  is a program that detects statistically overrepresented words (motifs) in DNA sequences. The user may specify the characteristics of the motifs to be detected. A motif here is a short string of nucleotides, degenerate symbols, and spacers. 'Motif size' is the number of non-spacer characters in a motif. Spacers ('N's) are constrained to be in the center of the motif. Degenerate symbols allowed in a motif are R (purine - A or G), Y (pyrimidine - C or T), W (A or T), and S (C or G). (Reference: Sinha, S. & Tompa, M. 2003. Nucleic Acids Research 31:3586-3588).

red_bullet.gif (914 bytes) Compositional heterogeneity - Graphe:ADN riche en: (Atelier BioInformatique l'Université de Provence, France) N.B. In French but obvious (Soumettre = Submit). Presents in graphic format AT, GC or single base enrichment in the sequence. A simpler version is GC Content Plot Online.
red_bullet.gif (914 bytes) Graph DNA: DNA Skew Graphing (Viral Bioinformatics Resource Center, University of Victoria, Canada) - this Java applet performs DNA walks, purine, AT and GC skews on small (<1 Mb) genomes. Requires registration and login. Alternative locations for cumulative GC skew are the GC  Skewing (Davidson College, U.S.A.), and GenSkew: Genomic nucleotide skew application (Developed by TU Munich; maintained by Department of Computational Systems Biology of the University of Vienna, Austria)

red_bullet.gif (914 bytes) Z curve (Centre of BioInformatics,Tianjin University, China) - results in unique three-dimensional curve representations for a given DNA sequence, which is composed of three components ( xn, yn and zn):

red_bullet.gif (914 bytes) DNA base composition analysis tool  (J. Zheng, Queen's University, Canada)  - This program can analysis a 30 kb DNA sequence in three different ways. It computes the percentage of one or two selectable nucleotide(s), the normal skew of two selectable nucleotides, and the cumulative skew of two selectable nucleotides for a given sequence. The result can be displayed in both graphic and value data format. 

red_bullet.gif (914 bytes) JaMBW (European Molecular Biology Laboratory of Heidelberg, Germany). Java based Molecular Biologist's Workbench.Select Chapter 1 for sequence format conversion (upper <---> lower case; T  U; reverse or complement sequence).  N.B. Also check out   Chapter 5 "Buffer Calculator."  

red_bullet.gif (914 bytes) DSHIFT - a web server for predicting DNA 1H, 13C & 31P chemical shifts (Reference: S.L. Lam. 2007. Nucl. Acids Res. 35(Web Server issue): W713-W717)

red_bullet.gif (914 bytes) Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility (Reference: Raghava, G. P. S. 2001. Biotech Software and Internet Report 2:198-200).

red_bullet.gif (914 bytes) Random DNA sequence generator (Reference: Villesen, P. 2007.  Molecular Ecology Notes 7: 965–968.).  Similar resources are available here and here.
red_bullet.gif (914 bytes) GenRGenS, a software dedicated to random generation of genomics sequences that supports several classes of models, including Markov chains, HMM, context-freegrammars, PROSITE patterns and more. (Reference: Y. Ponty et al. Bioinformatics, 22:1534-1535).

red_bullet.gif (914 bytes) Signature (Institute of Bioinformatics, University of Georgia, U.S.A.)- find under- and over-represented short oligonucleotides (di-, tri- and tetranucleotides) in a genome sequence

red_bullet.gif (914 bytes) AIMIE Ab Initio Motif Identification Environment - this tool should be useful for picking up high-copy dispersed repeats, such as repeated extragenic palindrome (REP) elements, CRISPR repeats, uptake signal sequences (DUS/USS), intergenic dyad sequences and several other over-represented sequence motifs  in genome sequences.  (Reference: Mrázek, J. et al. 2008. Bioinformatics 24: 1041-1048).

red_bullet.gif (914 bytes) fwDNA (Institute of Bioinformatics, University of Georgia, U.S.A.) - Find Frequent Words (oligonucleotides) in a genome sequence

red_bullet.gif (914 bytes) ASEQH Analysis of sequence heterogeneity (Institute of Bioinformatics, University of Georgia, U.S.A.) - sliding window plots which allows users to generate sliding window plots of seven different sequence properties:  G + C content; S3 : G + C at codon site 3; d* - differences with respect to genomic average; synonymous codon bias with respect to genomic average; amino acid composition differences with respect to genomic average; (G - C) / (G + C) : G-C skew (A - T) / (A + T) : A-T skew. It is intended for analysis of prokaryotic genomes but it can be applied to eukaryotic chromosomes with some limitations. 

red_bullet.gif (914 bytes) PATLOC (Pattern Locator) (Institute of Bioinformatics, University of Georgia, U.S.A.) - is a new tool for finding sequence patterns in long DNA sequences. For this web-based service, a restricted version of Pattern Locator is used, which estimates the time needed for completion of the search and stops if the estimated CPU time exceeds a certain limit (currently 90 seconds). The CPU time limit was introduced in order to protect the web server from overloading due to requests involving too complex sequence patterns.  If you want to search for Sigma-70 (RpoD)-like promoters the pattern syntax for your search is:  <>{TTGACA(N)[15:18]TATAAT}[4].  N.B. the [4] allows for 4 mismatches - I recommend a maximum of two.  If you only want one strand screened omit the <> at the start. You can restrict the search to intergenic regions (but this will eliminate also matches that partially overlap with genes or use the .patvic.txt output file to find where they are (Jan Mrázek, personal communication).