TRANSLATION: DNA Ž PROTEIN

SITES: A number of excellent sites exist all of which permit translation in all six reading frames. I would recommend "ORF Finder" because of its visuals and Pipeline or GeneMark if you are seriously interested in identifying genes within your sequence.   The latter two programs permit the analysis of long sequences (submit by attachment not in the box). 

red_bullet.gif (914 bytes) Frameshift errors in DNA sequence - three sites are available

  GENIO/frame (Genio/scan N. Mache)
  FrameD (Toulose Genopole, France; T. Schliex et al. 2003. Nucl. Acids Res. 31: 3738-3741)
 AMIGene

red_bullet.gif (914 bytes) StarORF - facilitates the identification of the protein(s) encoded within a DNA sequence. Using StarORF, the DNA sequence is first transcribed into RNA and then translated into all the potential ORFs (Open Reading Frame) encoded within each of the six translation frames (3 in the forward direction and 3 in the reverse direction). This allows students to identify the translation frame that results in the longest protein coding sequence.

red_bullet.gif (914 bytes)  Batch ORF Finder (M.V. Graves, University of Massachusetts Lowell, U.S.A.)

red_bullet.gif (914 bytes)  TICO - Translation Initiation site COrrection - provides an interface for direct post processing of the predictions obtained from GLIMMER to improve the accuracy of annotated Translation Initiation Sites (TIS). (Reference: M. Tech et al. 2005. Bioinformatics 21: 3568-3569)

red_bullet.gif (914 bytes) ORF (S.van Hijum, Molecular Genetics (MolGen), University of Groningen, Netherlands) - predicts ORFs in a FASTA input DNA sequence by using either Glimmer 2.1 and RBSfinder, or ZCurve_V.

red_bullet.gif (914 bytes) Genome Analysis Pipeline (Genome Analysis and System Modeling Group, Life Sciences Division, Oak Ridge National Laboratory, U.S.A.) - offers both prokaryote (Generation and Glimmer) as well as higher eukaryotes and Saccharomyces (GrailEXP and Genscan) genome analysis.  In the case of prokaryote analysis there is a wide choice of model organisms as well as the choice of "Select all services". Very nice Java and html presentation of analysis results. Accuracy.

red_bullet.gif (914 bytes) GeneMark Homepage (Dr. M. Borodovsky, Georgia Institute of Technology Atlanta, U.S.A.) offers a family of programs for ORF analysis.  This site links one to a growing number of programs for modeling phage, bacterial, and eukaryotic data. Extensive control is possible with the data output, i.e. one can request the nucleotide and protein sequence of the ORFs.   This service can also be accessed at EMBLAccuracy.

red_bullet.gif (914 bytes) Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee. Key features of Prodigal include (a)Speed: It can analyze an entire microbial genome in 30 seconds or less; (b) Accuracy: It possesses a very sophisticated ribosomal binding site scoring system that enables it to locate the translation initiation site with great accuracy (96% of the 5' ends in the Ecogene data set are located correctly); (c) Specificity: Prodigal's false positive rate compares favorably with other gene identification programs, and usually falls under 5%; and (d)  GC-Content Indifferent: It performs well even in high GC genomes,  (Reference: Hyatt D et al. 2010. Bioinformatics 28:2223-2230).

red_bullet.gif (914 bytes) Generation - Microbial Gene Prediction System (Genome Analysis & System Modeling Grp, Life Sciences Division, Oak Ridge National Laboratory, U.S.A.) can train itself on microbial and model organisms to produce a set of data which can be used by GrailEXP v3.0 to recognize genes in these organisms. Accuracy.

red_bullet.gif (914 bytes) EasyGene (Technical University of Denmark; Reference: T.S. Larsen and A. Krogh. 2003. EasyGene - a prokaryotic gene finder that ranks ORFs by statistical significance.  BMC Bioinformatics 4:21) -  produces a list of predicted genes given a sequence of prokaryotic DNA. Each prediction is attributed with a significance score (R-value) indicating how likely it is to be just a non-coding open reading frame rather than a real gene. The user needs only to specify the organism hosting the query sequence.  It you are interested in the analysis of existing bacterial genomes consult EasyGene 1.2.

red_bullet.gif (914 bytes) FrameD - is a A noise-resistant gene finder for prokaryotic and matured eukaryotic sequences offering considerable flexibility in search strategies and output format (Reference: Schiex, T. et al. 2003. Nucl. Acids. Res. 31: 3738-374).

red_bullet.gif (914 bytes)  AMIGene - (Reference: Bocs, S. et al. 2003. Nucl. Acids Res. 13:  3723-3726)

red_bullet.gif (914 bytes) FgenesB (SoftBerry) - fast Pattern/Markov chain-based bacterial operon and gene prediction. Somewhat limited range of model bacteria & archaea. Accuracy.

red_bullet.gif (914 bytes) Gene Identification (Shibuya & Rigoutsos, IBM Bioinformatics Group, U.S.A.) - select "gene identification" and specify start codons (ATG, GTG or TTG).  When the data is presented one can modify the minimum ORF size and obtain both the DNA and protein sequence of the ORF of interest by clicking on the diagram.

red_bullet.gif (914 bytes) FramePlot 2.3 (National Institute of Health, Japan) - This site permits one to select the minimal size of the ORF, and the start codon (ATG or GTG being the most common).  While in presentation (a series of coloured arrows is somewhat confusing by clicking on any arrow one can view the DNA and protein sequence.  These can be used in homology (BLASTN & BLASTP) searches.

red_bullet.gif (914 bytes) ExPASy – Translate tool (ExPASy, University of Geneva, Switzerland).  I find this site useful if I have a gene which begins with an alternative start codon.  An alternative site is Translate Nucleic Acid Sequence Tool (University of Massachusetts Medical School, U.S.A.) which permits choice of reading frame(s) and genetic code.

red_bullet.gif (914 bytes) Third Position GC Skew Display (The Institute for Genomic Research, U.S.A.) predicts genes by comparing possible open reading frames (variety of initiation codon options) to a third position GC plot. This tool is apparently most effective for genomes with a high G+C content.

red_bullet.gif (914 bytes) Six-frame translations can be done at Tuebingen, Chicago, Russia, Bioline, and Science Launcher.

red_bullet.gif (914 bytes) MBS Translator (JustBio Tools) - An excellent new site  since one can translate specifically from ATG and the results are presented with the nucleotide sequence overlaying the amino acid sequence.  Ideal for Cut/Paste into a manuscript. You need to register to use this free tool.  Other quick translation tools are here and here .

red_bullet.gif (914  bytes) Translate a sequence for publication - use:  DNA Analyzer (York University, Canada).  Choose "Translation to proteins". 

        M   V   S   P   T   *   M   P   I   T   *               
11 act ATG GTT TCC CCT ACA TAA ATG CCA ATA ACG TAA cccggg 42

Translation of multiple sequences:

red_bullet.gif (914 bytes) Virtual Ribosome (Reference: R. Wernersson. 2006. Nucl. Acids Res. 34 (web Server Issue): W385-388) - I find that the output from the first two sites is optimal for translating multiple DNA sequences.

red_bullet.gif (914 bytes) RevTrans 1.4 Server (CBS, Danish Technical University)

red_bullet.gif (914 bytes) DNA to Protein Translation Calculator (Bioinformatic.Net)

Backtranslation: i.e. taking a protein sequence and defining it as DNA sequence.

red_bullet.gif (914 bytes) Back Translation - part of the The Sequence Manipulation Suite; limited choice of codon usage (E.coli and H. sapiens)

red_bullet.gif (914 bytes) Protein to DNA reverse translation - includes a wide range of genetic codes

red_bullet.gif (914 bytes) Reverse translation of aminoacid sequences - probably the best in that it includes the genetic codes of seven organisms (E.coli, and 6 eukaryotes); plus provides consensus and detail output of results in RNA or DNA. 

When you have identified a potential gene you might want to determine its codon usage. Codon Adaptation Index (CAI) is a technique for analyzing Codon usage bias. CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes

red_bullet.gif (914 bytes) For quantitative data on general codon usage in different cells consult the  Codon Usage Database (Kazusa DNA Research Institute, Japan) Unfortunately the data is presented in frequency charts which have to be manually converted to % codon usage for specific amino acids. In addition, the data has not been updated since 2007. For a current database use the Prokaryotes Codon Usage Database (Georgia Institute of Technology, Atlanta, USA) which provides data on the codon utilization patterns of hundreds of prokaryoytes. For Information on the codons see: Genetic Code Viewer (EMBL) or DNA analysis (Codon Usage) which is part of the The Sequence Manipulation Suite (Paul Stothard) at Bioinformatics.org/The Open Lab. 

Inidon (Andre Villegas, LFZ, Public Health Agency ofCanada) - this Java-based program reads GenBank *.ffn files (FASTA formatted gene files) and provides one with a numeric and percentage usage of start codons.  The latter can be downloaded for sequenced genomes from the GenBank genome site. For bacteriophage and other smaller genomes locate the file using the "search genome" function at NCBI and select "Views - coding regions."  From the next screen use "Save - FASTA nucleotide."

red_bullet.gif (914 bytes) CodonW - is designed to simplify the Multivariate analysis (correspondence analysis) of codon and amino acid usage. It also calculates standard indices of codon usage. details of this program are provided here.

red_bullet.gif (914 bytes) CAI Calculator 2 (John Peden) - Codon usage is biased within and across genomes. The unequal frequency of codons results mainly from overall base composition of the genome however some genes, such those which are highly expressed, tend to exhibit stronger codon bias. Sharp & Li (1987) proposed to use codon adaptation index to evaluate how well a gene is adapted to the translational machinery. CAI is a single value measurement that summarizes the codon usage of a gene relative to the codon usage of a reference set of genes. A higher CAI value usually suggests that the gene of interest is likely to be highly expressed. This site offers the choice of Sharp & Li (1987) or Eyre-Walker (1996) equations for calculating CAI. 

red_bullet.gif (914 bytes) CAIcal - performs several computations in relation to codon usage and the codon adaptation of DNA or RNA sequences to host organisms. (Reference: Puigbo, P. et al. 2008. Biology Direct 3:38).

red_bullet.gif (914 bytes) E-CAI (Expected CAI calculation) - calculates the expected value of the Codon Adaptation Index (CAI) for a set of query sequences by generating random sequences with similar G+C content and amino acid composition to the input. This expected CAI therefore provides a direct threshold value for discerning whether the differences in the CAI value are statistically significant and arise from the codon preferences or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences. (Reference: Puigbo, P. et al. 2008. BMC Bioinformatics 9:65).

red_bullet.gif (914 bytes) gcua - Graphical Codon Usage (Universität Regensburg Naturwissenschaftliche Fakultät III, Germany) - offers three possibilities: (a) each triplet position vs usage table - the fraction of usage of each codon in the selected organism is presented; (b) each codon vs. usage table - the fraction of usage of each codon in the submitted sequence will be computed and plotted against the fraction of usage of the codon in the selected organism; and, (c) compare two usage tables - submit or choose two codon usage tables. The fraction of usage of each codon in the submitted usage tables will be compared graphically.

red_bullet.gif (914 bytes) CodonO - synonymous codon usage biases are associated with various biological factors, such as gene expression level, gene length, gene translation initiation signal, protein amino acid composition, protein structure, tRNA abundance, mutation frequency and patterns, and GC compositions. CodonO is a user-friendly tool for codon usage bias analyses across and within genomes in real time. (Reference: M.C. Angellotti et al. 2007. Nucl. Acids Res. 35: (Web Server issue)W132-W136)

red_bullet.gif (914 bytes) Rare codon analysis (GenScript USA Inc.) - it is extremely useful to analyze your coding sequences for codon usage prior to attempting protein expression.  This tools offers two bacteria (E.coli & Streptomyces), a variety of plants (Nicotonia & Arabidopsis), animals (human & insects) and yeast (Pichia & Saccharomyces).

If you want to express a gene in an organism having different different codon usage:

red_bullet.gif (914 bytes) JCat - Codon Adapter Tool - offers a complete range of eukaryotic & prokaryotic cells; and, the ability to select against rho-independent terminators and restriction sites. (Reference: A. Grote et al. 2005. Nucl. Acids Res. 33: W526-W531).

red_bullet.gif (914 bytes) OPTIMIZER: a web server for optimizing the codon usage of DNA sequences - one can use pre-computed tables from more than 150 prokaryotic species under a strong translational selection. Three methods of optimization are available: the 'one amino acid - one codon' approach, a random approach or an intermediate one. Several options, such as avoiding specific restriction sites and several outputs, are also available. This server can be useful for predicting and optimizing the level expression of a gene in heterologous gene expression.(Reference: P. Puigbņ et al. 2007. Nucl. Acids Res. 35:(Web Server issue) W126-W131)