General Translations: DNA to Protein
SITES: A number of excellent sites exist all of which permit translation in all six reading frames. I would recommend "ORF Finder" because of its visuals and Pipeline or GeneMark if you are seriously interested in identifying genes within your sequence. The latter two programs permit the analysis of long sequences (submit by attachment not in the box).
Frameshift errors:
path :: protein back-translation and alignment
path :: protein back-translation and alignment
- addresses the problem of finding distant protein homologies where the
divergence is the result of frameshift mutations and substitutions. Given
two input protein sequences, the method implicitly aligns all the
possible pairs of DNA sequences that encode them, by manipulating
memory-efficient graph representations of the complete set of putative
DNA sequences for each protein.
(Reference: Gîrdea M et al. 2010. Algorithms for
Molecular Biology 5:)
Simple translation tools - DNA to protein sequences:
Open Reading Frame Finder
Open Reading Frame Finder (NCBI) - searches for open reading frames (ORFs) in the DNA sequence you enter. The program returns the range of each ORF, along with its protein translation. Use ORF finder to search newly sequenced DNA for potential protein encoding segments, verify predicted protein using newly developed SMART BLAST or regular BLASTP.
Six-frame Translations
Six-frame Translations can be done at Tuebingen and Bioline,
EMBOSS Sixpack
EMBOSS Sixpack (EMBL-EBI) - reads a DNA sequence and outputs the three forward and (optionally) three reverse translations in a visual manner. Alternatively use EMBOSS Transeq
Other DNA to Protein translation sites
Other DNA to Protein translation sites are to be found here (University of Gottenburg, Sweden) and here (University of the Basque Country, Spain)
Translate
Translate (ExPASy, Switzerland) - is a tool which allows the translation of a nucleotide (DNA/RNA) sequence to a protein sequence.
Translation of multiple sequences:
Virtual Ribosome
Virtual Ribosome
- The Virtual Ribosome is a comprehensive tool for translating DNA
sequences to the corresponding peptide sequences. Besides being a strong
translation tool in it's own right (with an integrated ORF finder, support
for all translation tables defined by the NCBI taxonomy group, and a
number of options regarding START and STOP codons), the Virtual Ribosome
can work directly on files containing annotation of gene structure. This
makes it easy to map various aspect of Intron/Exon structure onto the
translated sequence.
(Reference: R. Wernersson. 2006. Nucl. Acids Res. 34
(web Server Issue): W385-388)
RevTrans
RevTrans
- takes a set of DNA sequences, virtually translates them, aligns the
peptide sequences, and uses this as a scaffold for constructing the
corresponding DNA multiple alignment. New in RevTrans 2.0: Integration
with Virtual Ribosome for translation and ORF finding, visualization of
alignments using JalView, more alignments programs: MAFFT, T-COFFEE,
Dialign 2, Dialign-T and ClustalW2. Improved tab-based interface.
(Reference: Wernersson R & Pedersen AG (2003) Nucl.
Acids Res. 31(13): 3537-3539).
TranslatorX
TranslatorX
- is a web server designed to align protein-coding nucleotide sequences
based on their corresponding amino acid translations. TranslatorX
novelties include: (i) use of all documented genetic codes and the
possibility of assigning different genetic codes for each sequence; (ii) a
battery of different multiple alignment programs; (iii) translation of
ambiguous codons when possible; (iv) an innovative criterion to clean
nucleotide alignments with GBlocks based on protein information; and (v) a
rich output, including Jalview-powered graphical visualization of the
alignments, codon-based alignments coloured according to the corresponding
amino acids, measures of compositional bias and first, second and third
codon position specific alignments.
(Reference: Abascal F, et al. (2010) Nucleic Acids
Res. 38: W7-13).
Backtranslation i.e. taking a protein sequence and defining it as DNA sequence:
Back Translation
Back Translation - part of the The Sequence Manipulation Suite; limited choice of codon usage (E.coli and H. sapiens)
Protein to DNA reverse translation
Protein to DNA reverse translation - includes a wide range of genetic codes (BioPHP PHP for Bioinformatics)
BackTranslator
BackTranslator (Max Planck Institute for Biology, Tübingen)
Identification of open-reading frames:
StarORF
StarORF - facilitates the identification of the protein(s) encoded within a DNA sequence. Using StarORF, the DNA sequence is first transcribed into RNA and then translated into all the potential ORFs (Open Reading Frame) encoded within each of the six translation frames (3 in the forward direction and 3 in the reverse direction). This allows students to identify the translation frame that results in the longest protein coding sequence.
GeneMark Homepage
GeneMark
Homepage (M. Borodovsky, Georgia Institute of Technology Atlanta, U.S.A.)
offers a family of programs for ORF analysis. This site links one to a
growing number of programs for modeling phage, bacterial, and eukaryotic
data. Extensive control is possible with the data output, i.e. one can
request the nucleotide and protein sequence of the ORFs. Two programs to
consider are
GeneMarkS
(Reference: Besemer J et al. 2001. Nucleic Acids
Research; 29:2607-2618)
or
GeneMarkS-2
and
Heuristic Approach for Gene Prediction
(Reference: Besemer J & Borodovsky M. 1999. Nucleic
Acids Research; 27:911 3920).
For metagenomic analysis use
MetaGeneMark
(Reference: Zhu, W. et al. 2010. Nucleic Acids
Research; 38: e132).
EasyGene
EasyGene
- produces a list of predicted genes given a sequence of prokaryotic DNA.
Each prediction is attributed with a significance score (R-value)
indicating how likely it is to be just a non-coding open reading frame
rather than a real gene. The user needs only to specify the organism
hosting the query sequence.
(Reference: T.S. Larsen & A. Krogh. (2003). BMC
Bioinformatics 4: 21)
FramePlot 4.0
FramePlot 4.0
(National Institute of Health, Japan) - This site permits one to select
the minimal size of the ORF, and the start codon (ATG or GTG being the
most common). While in presentation (a series of coloured arrows is
somewhat confusing by clicking on any arrow one can view the DNA and
protein sequence. These can be used in homology (BLASTN & BLASTP)
searches.
(Reference: Ishikawa,J. & Hotta K. 1999. FEMS
Microbiol. Lett. 174 :251-253).
ExPASy
ExPASy – Translate tool (ExPASy, University of Geneva, Switzerland). I find this site useful if I have a gene which begins with an alternative start codon.
Codon usage:
When you have identified a potential gene you might want to determine its codon usage. Codon Adaptation Index (CAI) is a technique for analyzing Codon usage bias. CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes
Codon Usage Database
For quantitative data on general codon usage in different cells consult the Codon Usage Database (Kazusa DNA Research Institute, Japan) - Unfortunately the data is presented in frequency charts which have to be manually converted to % codon usage for specific amino acids. In addition, the data has not been updated since 2007. For Information on the codons see DNA analysis (Codon Usage) which is part of the The Sequence Manipulation Suite(Paul Stothard) at Bioinformatics.org/The Open Lab.
Inidon
Inidon (Andre Villegas, Public Health Ontario, Canada) - this Java-based program reads GenBank *.ffn files (FASTA formatted gene files) and provides one with a numeric and percentage usage of start codons. The latter can be downloaded for sequenced genomes from the GenBank genome site. For bacteriophage and other smaller genomes locate the file using the "search genome" function at NCBI and select "Views - coding regions." From the next screen use "Save - FASTA nucleotide." This program is currently unavailable online but the perl script can be downloaded from here.
CAI Calculator 2
CAI Calculator 2 (John Peden) - Codon usage is biased within and across genomes. The unequal frequency of codons results mainly from overall base composition of the genome however some genes, such those which are highly expressed, tend to exhibit stronger codon bias. Sharp & Li (1987) proposed to use codon adaptation index to evaluate how well a gene is adapted to the translational machinery. CAI is a single value measurement that summarizes the codon usage of a gene relative to the codon usage of a reference set of genes. A higher CAI value usually suggests that the gene of interest is likely to be highly expressed. This site offers the choice of Sharp & Li (1987) or Eyre-Walker (1996) equations for calculating CAI.
CAIcal
CAIcal -
performs several computations in relation to codon usage and the codon
adaptation of DNA or RNA sequences to host organisms.
(Reference: Puigbo, P. et al. 2008. Biology Direct
3:38).
E-CAI
E-CAI
(Expected CAI calculation) - calculates the expected value of the Codon
Adaptation Index (CAI) for a set of query sequences by generating random
sequences with similar G+C content and amino acid composition to the
input. This expected CAI therefore provides a direct threshold value for
discerning whether the differences in the CAI value are statistically
significant and arise from the codon preferences or whether they are
merely artifacts that arise from internal biases in the G+C composition
and/or amino acid composition of the query sequences.
(Reference: Puigbo, P. et al. 2008. BMC
Bioinformatics 9:65).
GCUA
GCUA - Graphical Codon UsAge (Universität Regensburg Naturwissenschaftliche Fakultät III, Germany) - offers three possibilities: (a) each triplet position vs usage table - the fraction of usage of each codon in the selected organism is presented; (b) each codon vs. usage table - the fraction of usage of each codon in the submitted sequence will be computed and plotted against the fraction of usage of the codon in the selected organism; and, (c) compare two usage tables - submit or choose two codon usage tables. The fraction of usage of each codon in the submitted usage tables will be compared graphically.
Codon Statistics Database
Codon Statistics Database:
A Database of Codon Usage Bias - Enter a taxonomy ID (e.g. "9606"), the
name of a species (e.g. "Human" or "Homo sapiens") or a group of species
(e.g. "Primates"). Then select an option from the drop-down menu and press
"Submit". It then provides two sets of tables. One set lists, for each
codon, the frequency, the Relative Synonymous Codon Usage, and whether the
codon is preferred. Another set of tables lists, for each gene, its GC
content, Effective Number of Codons, Codon Adaptation Index, and frequency
of optimal codons.
(Reference: Subramanian K et al. (2020) Molec Biol
Evol 39(8) DOI: https://doi.org/10.1093/molbev/msac157)
Rare codon analysis tool
Rare codon analysis tool (GenScript USA Inc.) - it is extremely useful to analyze your coding sequences for codon usage prior to attempting protein expression. This tools offers two bacteria (E.coli & Streptomyces), a variety of plants (Nicotonia & Arabidopsis), animals (human & insects) and yeast (Pichia & Saccharomyces).
PAL2NAL
PAL2NAL -
a program that converts a multiple sequence alignment of proteins and the
corresponding DNA (or mRNA) sequences into a codon alignment. The program
automatically assigns the corresponding codon sequence even if the input
DNA sequence has mismatches with the input protein sequence, or contains
UTRs, polyA tails. It can also deal with frame shifts in the input
alignment, which is suitable for the analysis of pseudogenes. The
resulting codon alignment can further be subjected to the calculation of
synonymous (ds) and non-synonymous dN substitution
rates.
(Reference: Suyama M et al. 2006. Nucleic Acids Res.
34: W609-W612).
If you want to express a gene in an organism having different codon usage:
JCat
JCat - Codon Adapter Tool - offers a
complete range of eukaryotic & prokaryotic cells; and, the ability to
select against rho-independent terminators and restriction sites.
(Reference: A. Grote et al. 2005. Nucl. Acids Res.
33: W526-W531).
OPTIMIZER
OPTIMIZER: a web server for optimizing the
codon usage of DNA sequences - one can use pre-computed tables from more
than 150 prokaryotic species under a strong translational selection. Three
methods of optimization are available: the 'one amino acid - one codon'
approach, a random approach or an intermediate one. Several options, such
as avoiding specific restriction sites and several outputs, are also
available. This server can be useful for predicting and optimizing the
level expression of a gene in heterologous gene expression.
(Reference: P. Puigbò et al. 2007. Nucl. Acids Res.
35(Web Server issue): W126-131).
IDT Codon Optimization Tool
IDT Codon Optimization Tool - was developed to optimize a DNA or protein sequence from one organism for expression in another by reassigning codon usage based on the frequencies of each codon's usage in the new organism. For example, valine is encoded by 4 different codons (GUG, GUU, GUC, and GUA). In human cell lines, however, the GUG codon is preferentially used (46% use vs. 18, 24, and 12%, respectively). The codon optimization tool takes this information into account and assigns valine codons with those same frequencies. In addition, the tool algorithm eliminates codons with less than 10% frequency and re-normalizes the remaining frequencies to 100%. Moreover, our optimization tool reduces complexities that can interfere with manufacturing and downstream expression, such as repeats, hairpins, and extreme GC content. requires registration.
GenSmart™ Codon Optimization
GenSmart™ Codon Optimization - is a free, user-friendly online tool that enables you to optimize the design of wild type or recombinant gene sequences towards higher expression in prokaryotic and mammalian expression systems.
VectorBuilder Codon Optimization Tool
VectorBuilder's Codon Optimization Tool is designed to help you achieve the optimal codon adaptation index (CAI) for your GOI in any organism of your choice. It includes a comprehensive list of species and is seamlessly incorporated into our online vector design platform enabling you to optimize your GOIs while designing vectors. Additionally, it allows you to avoid cleavage sites of selected restriction enzymes while codon optimizing your target sequence. Our tool can be used for optimizing sequences with extreme GC content and simple repeats for highly efficient gene synthesis and DNA cloning applications.
RBS Calculator
RBS Calculator
- they developed a biophysical model employing thermodynamic first
principles and a four-parameter free energy model to accurately predict
the ribosome's translation initiation rates for 136 synthetic 5′ UTRs with
large structures, diverse shapes and multiple standby site modules. The
model predicts and experiments confirm that the ribosome can readily bind
distant standby site modules that support high translation rates,
providing a physical mechanism for observed context effects and long-range
post-transcriptional regulation.
(Reference: A. E. Borujeni, et al. 2014. Nucleic
Acid Research; 42 (4): 2646–2659).
IRES (Internal Ribosome Entry Site) segments are known to attract eukaryotic ribosomal translation initiation complex and thus promote translation initiation independently of the presence of the commonly utilized 5'-terminal 7mG cap structure. It is not yet clear whether the activity could be attributed to a common sequence or to a common secondary structure present in them. Such IRES regions were found in a broad range of +RNA viruses and in the untranslated regions of some eukaryotic cellular mRNAs. Database 1; Database 2
IRESpy
IRESpy
- is a fast, reliable, high-throughput IRES online prediction tool. It
provides a publicly available tool for all IRES researchers, and can be
used in other genomics applications such as gene annotation and analysis
of differential gene expression.
(Reference: Wang J & Gribskov (2019) BMC
Bioinformatics 20: 409).
IRESPred
IRESPred - is developed for prediction of
both viral and cellular IRES using Support Vector Machine (SVM). The
predictive model was built using 35 features that are based on sequence
and structural properties of UTRs and the probabilities of interactions
between UTR and small subunit ribosomal proteins (SSRPs). The model was
found to have 75.51% accuracy, 75.75% sensitivity, 75.25% specificity, and
75.75% precision.
(Reference: Kolekar P et al. (2016) Sci Rep. 6:
27436).
IRESbase
IRESbase is a comprehensive database of
experimentally verified viral and eukaryotic internal ribosome entry sites
(IRESs) with BLAST search capacity
(Reference: Wu TY et al. (2009) BMC Bioinformatics
10: 160).
Updated: November, 2025