General Translations: DNA to Protein

SITES: A number of excellent sites exist all of which permit translation in all six reading frames. I would recommend "ORF Finder" because of its visuals and Pipeline or GeneMark if you are seriously interested in identifying genes within your sequence. The latter two programs permit the analysis of long sequences (submit by attachment not in the box).


Frameshift errors:

path :: protein back-translation and alignment

path :: protein back-translation and alignment - addresses the problem of finding distant protein homologies where the divergence is the result of frameshift mutations and substitutions. Given two input protein sequences, the method implicitly aligns all the possible pairs of DNA sequences that encode them, by manipulating memory-efficient graph representations of the complete set of putative DNA sequences for each protein.
(Reference: Gîrdea M et al. 2010. Algorithms for Molecular Biology 5:)


Simple translation tools - DNA to protein sequences:

Open Reading Frame Finder

Open Reading Frame Finder (NCBI) - searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.


Six-frame Translations

Six-frame Translations can be done at Tuebingen and Bioline,


EMBOSS Sixpack

EMBOSS Sixpack (EMBL-EBI) - reads a DNA sequence and outputs the three forward and (optionally) three reverse translations in a visual manner. Alternatively use EMBOSS Transeq


Other DNA to Protein translation sites

Other DNA to Protein translation sites are to be found here (University of Gottenburg, Sweden) and here (University of the Basque Country, Spain)


Translate

Translate (ExPASy, Switzerland) - is a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence.


Translation of multiple sequences:

Virtual Ribosome

Virtual Ribosome - The Virtual Ribosome is a comprehensive tool for translating DNA sequences to the corresponding peptide sequences. Besides being a strong translation tool in it's own right (with an integrated ORF finder, support for all translation tables defined by the NCBI taxonomy group, and a number of options regarding START and STOP codons), the Virtual Ribosome can work directly on files containing annotation of gene structure. This makes it easy to map various aspect of Intron/Exon structure onto the translated sequence.
(Reference: R. Wernersson. 2006. Nucl. Acids Res. 34 (web Server Issue): W385-388)


RevTrans

RevTrans - takes a set of DNA sequences, virtually translates them, aligns the peptide sequences, and uses this as a scaffold for constructing the corresponding DNA multiple alignment. New in RevTrans 2.0: Integration with Virtual Ribosome for translation and ORF finding, visualization of alignments using JalView, more alignments programs: MAFFT, T-COFFEE, Dialign 2, Dialign-T and ClustalW2. Improved tab-based interface.
(Reference: Wernersson R & Pedersen AG (2003) Nucl. Acids Res. 31(13): 3537-3539).


TranslatorX

TranslatorX - is a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments.
(Reference: Abascal F, et al. (2010) Nucleic Acids Res. 38: W7-13).


Backtranslation i.e. taking a protein sequence and defining it as DNA sequence:

Back Translation

Back Translation - part of the The Sequence Manipulation Suite; limited choice of codon usage (E.coli and H. sapiens)


Protein to DNA reverse translation

Protein to DNA reverse translation - includes a wide range of genetic codes (BioPHP PHP for Bioinformatics)


BackTranslator

BackTranslator (Max Planck Institute for Biology, Tübingen)


Identification of open-reading frames:

StarORF

StarORF - facilitates the identification of the protein(s) encoded within a DNA sequence. Using StarORF, the DNA sequence is first transcribed into RNA and then translated into all the potential ORFs (Open Reading Frame) encoded within each of the six translation frames (3 in the forward direction and 3 in the reverse direction). This allows students to identify the translation frame that results in the longest protein coding sequence.


GeneMark Homepage

GeneMark Homepage (M. Borodovsky, Georgia Institute of Technology Atlanta, U.S.A.) offers a family of programs for ORF analysis. This site links one to a growing number of programs for modeling phage, bacterial, and eukaryotic data. Extensive control is possible with the data output, i.e. one can request the nucleotide and protein sequence of the ORFs. Two programs to consider are GeneMarkS
(Reference: Besemer J et al. 2001. Nucleic Acids Research; 29:2607-2618)
or GeneMarkS-2 and Heuristic Approach for Gene Prediction
(Reference: Besemer J & Borodovsky M. 1999. Nucleic Acids Research; 27:911 3920).
For metagenomic analysis use MetaGeneMark
(Reference: Zhu, W. et al. 2010. Nucleic Acids Research; 38: e132).


EasyGene

EasyGene - produces a list of predicted genes given a sequence of prokaryotic DNA. Each prediction is attributed with a significance score (R-value) indicating how likely it is to be just a non-coding open reading frame rather than a real gene. The user needs only to specify the organism hosting the query sequence.
(Reference: T.S. Larsen & A. Krogh. (2003). BMC Bioinformatics 4: 21)


FramePlot 4.0

FramePlot 4.0 (National Institute of Health, Japan) - This site permits one to select the minimal size of the ORF, and the start codon (ATG or GTG being the most common). While in presentation (a series of coloured arrows is somewhat confusing by clicking on any arrow one can view the DNA and protein sequence. These can be used in homology (BLASTN & BLASTP) searches.
(Reference: Ishikawa,J. & Hotta K. 1999. FEMS Microbiol. Lett. 174 :251-253).


ExPASy

ExPASy – Translate tool (ExPASy, University of Geneva, Switzerland). I find this site useful if I have a gene which begins with an alternative start codon.


Codon usage:

When you have identified a potential gene you might want to determine its codon usage. Codon Adaptation Index (CAI) is a technique for analyzing Codon usage bias. CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes


Codon Usage Database

For quantitative data on general codon usage in different cells consult the Codon Usage Database (Kazusa DNA Research Institute, Japan) - Unfortunately the data is presented in frequency charts which have to be manually converted to % codon usage for specific amino acids. In addition, the data has not been updated since 2007. For Information on the codons see DNA analysis (Codon Usage) which is part of the The Sequence Manipulation Suite(Paul Stothard) at Bioinformatics.org/The Open Lab.


Inidon

Inidon (Andre Villegas, Public Health Ontario, Canada) - this Java-based program reads GenBank *.ffn files (FASTA formatted gene files) and provides one with a numeric and percentage usage of start codons. The latter can be downloaded for sequenced genomes from the GenBank genome site. For bacteriophage and other smaller genomes locate the file using the "search genome" function at NCBI and select "Views - coding regions." From the next screen use "Save - FASTA nucleotide." This program is currently unavailable online but the perl script can be downloaded from here.


CAI Calculator 2

CAI Calculator 2 (John Peden) - Codon usage is biased within and across genomes. The unequal frequency of codons results mainly from overall base composition of the genome however some genes, such those which are highly expressed, tend to exhibit stronger codon bias. Sharp & Li (1987) proposed to use codon adaptation index to evaluate how well a gene is adapted to the translational machinery. CAI is a single value measurement that summarizes the codon usage of a gene relative to the codon usage of a reference set of genes. A higher CAI value usually suggests that the gene of interest is likely to be highly expressed. This site offers the choice of Sharp & Li (1987) or Eyre-Walker (1996) equations for calculating CAI.


CAIcal

CAIcal - performs several computations in relation to codon usage and the codon adaptation of DNA or RNA sequences to host organisms.
(Reference: Puigbo, P. et al. 2008. Biology Direct 3:38).


E-CAI

E-CAI (Expected CAI calculation) - calculates the expected value of the Codon Adaptation Index (CAI) for a set of query sequences by generating random sequences with similar G+C content and amino acid composition to the input. This expected CAI therefore provides a direct threshold value for discerning whether the differences in the CAI value are statistically significant and arise from the codon preferences or whether they are merely artifacts that arise from internal biases in the G+C composition and/or amino acid composition of the query sequences.
(Reference: Puigbo, P. et al. 2008. BMC Bioinformatics 9:65).


GCUA

GCUA - Graphical Codon UsAge (Universität Regensburg Naturwissenschaftliche Fakultät III, Germany) - offers three possibilities: (a) each triplet position vs usage table - the fraction of usage of each codon in the selected organism is presented; (b) each codon vs. usage table - the fraction of usage of each codon in the submitted sequence will be computed and plotted against the fraction of usage of the codon in the selected organism; and, (c) compare two usage tables - submit or choose two codon usage tables. The fraction of usage of each codon in the submitted usage tables will be compared graphically.


Codon Statistics Database

Codon Statistics Database: A Database of Codon Usage Bias - Enter a taxonomy ID (e.g. "9606"), the name of a species (e.g. "Human" or "Homo sapiens") or a group of species (e.g. "Primates"). Then select an option from the drop-down menu and press "Submit". It then provides two sets of tables. One set lists, for each codon, the frequency, the Relative Synonymous Codon Usage, and whether the codon is preferred. Another set of tables lists, for each gene, its GC content, Effective Number of Codons, Codon Adaptation Index, and frequency of optimal codons.
(Reference: Subramanian K et al. (2020) Molec Biol Evol 39(8) DOI: https://doi.org/10.1093/molbev/msac157)


Rare codon analysis tool

Rare codon analysis tool (GenScript USA Inc.) - it is extremely useful to analyze your coding sequences for codon usage prior to attempting protein expression. This tools offers two bacteria (E.coli & Streptomyces), a variety of plants (Nicotonia & Arabidopsis), animals (human & insects) and yeast (Pichia & Saccharomyces).


PAL2NAL

PAL2NAL - a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. The program automatically assigns the corresponding codon sequence even if the input DNA sequence has mismatches with the input protein sequence, or contains UTRs, polyA tails. It can also deal with frame shifts in the input alignment, which is suitable for the analysis of pseudogenes. The resulting codon alignment can further be subjected to the calculation of synonymous (ds) and non-synonymous dN substitution rates.
(Reference: Suyama M et al. 2006. Nucleic Acids Res. 34: W609-W612).


If you want to express a gene in an organism having different codon usage:

JCat

JCat - Codon Adapter Tool - offers a complete range of eukaryotic & prokaryotic cells; and, the ability to select against rho-independent terminators and restriction sites.
(Reference: A. Grote et al. 2005. Nucl. Acids Res. 33: W526-W531).


OPTIMIZER

OPTIMIZER: a web server for optimizing the codon usage of DNA sequences - one can use pre-computed tables from more than 150 prokaryotic species under a strong translational selection. Three methods of optimization are available: the 'one amino acid - one codon' approach, a random approach or an intermediate one. Several options, such as avoiding specific restriction sites and several outputs, are also available. This server can be useful for predicting and optimizing the level expression of a gene in heterologous gene expression.
(Reference: P. Puigbò et al. 2007. Nucl. Acids Res. 35(Web Server issue): W126-131).


IDT Codon Optimization Tool

IDT Codon Optimization Tool - was developed to optimize a DNA or protein sequence from one organism for expression in another by reassigning codon usage based on the frequencies of each codon's usage in the new organism. For example, valine is encoded by 4 different codons (GUG, GUU, GUC, and GUA). In human cell lines, however, the GUG codon is preferentially used (46% use vs. 18, 24, and 12%, respectively). The codon optimization tool takes this information into account and assigns valine codons with those same frequencies. In addition, the tool algorithm eliminates codons with less than 10% frequency and re-normalizes the remaining frequencies to 100%. Moreover, our optimization tool reduces complexities that can interfere with manufacturing and downstream expression, such as repeats, hairpins, and extreme GC content. requires registration.


GenSmart™ Codon Optimization

GenSmart™ Codon Optimization - is a free, user-friendly online tool that enables you to optimize the design of wild type or recombinant gene sequences towards higher expression in prokaryotic and mammalian expression systems.


VectorBuilder Codon Optimization Tool

VectorBuilder's Codon Optimization Tool is designed to help you achieve the optimal codon adaptation index (CAI) for your GOI in any organism of your choice. It includes a comprehensive list of species and is seamlessly incorporated into our online vector design platform enabling you to optimize your GOIs while designing vectors. Additionally, it allows you to avoid cleavage sites of selected restriction enzymes while codon optimizing your target sequence. Our tool can be used for optimizing sequences with extreme GC content and simple repeats for highly efficient gene synthesis and DNA cloning applications.


RBS Calculator

RBS Calculator - they developed a biophysical model employing thermodynamic first principles and a four-parameter free energy model to accurately predict the ribosome's translation initiation rates for 136 synthetic 5′ UTRs with large structures, diverse shapes and multiple standby site modules. The model predicts and experiments confirm that the ribosome can readily bind distant standby site modules that support high translation rates, providing a physical mechanism for observed context effects and long-range post-transcriptional regulation.
(Reference: A. E. Borujeni, et al. 2014. Nucleic Acid Research; 42 (4): 2646–2659).


IRES (Internal Ribosome Entry Site) segments are known to attract eukaryotic ribosomal translation initiation complex and thus promote translation initiation independently of the presence of the commonly utilized 5'-terminal 7mG cap structure. It is not yet clear whether the activity could be attributed to a common sequence or to a common secondary structure present in them. Such IRES regions were found in a broad range of +RNA viruses and in the untranslated regions of some eukaryotic cellular mRNAs. Database 1; Database 2



IRESpy

IRESpy - is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.
(Reference: Wang J & Gribskov (2019) BMC Bioinformatics 20: 409).


IRESPred

IRESPred - is developed for prediction of both viral and cellular IRES using Support Vector Machine (SVM). The predictive model was built using 35 features that are based on sequence and structural properties of UTRs and the probabilities of interactions between UTR and small subunit ribosomal proteins (SSRPs). The model was found to have 75.51% accuracy, 75.75% sensitivity, 75.25% specificity, and 75.75% precision.
(Reference: Kolekar P et al. (2016) Sci Rep. 6: 27436).


IRESbase

IRESbase is a comprehensive database of experimentally verified viral and eukaryotic internal ribosome entry sites (IRESs) with BLAST search capacity
(Reference: Wu TY et al. (2009) BMC Bioinformatics 10: 160).

Updated: November, 2025