Eukaryotic Genes Translations
Because many genes in eukaryotes are interrupted by introns it can be difficult to identify the protein sequence of the gene. Furthermore, programs designed for recognizing intron/exon boundaries for a particular organism or group of organisms may not recognize all intron/exons boundaries.
No single site should be used, rather a combinatorial approach should be taken, incorporating BLAST and the programs outlined below, when studying eukaryotic genes.
The following programs identify intron-exon boundaries. To help you assess the relative merits of each site I have attached GenBank files containing human, plant and Drosophila genes sequences, in which the submitters have designated the intron and exon sequences and the protein product.
AUGUSTUS
AUGUSTUS
- uses gene prediction in eukaryotic (Human, Drosophila, Arabidopsis,
Brugia, Aedes, Coprinus, & Tribolium)sequences that is based on a
generalized hidden Markov model, a probabilistic model of a sequence and
its gene structure. The web server allows the user to impose constraints
on the predicted gene structure
(Reference: M. Stanke & B. Morgenstern. 2005. Nucl.
Acids Res. 33: W465-W467).
WebAUGUSTUS
is an updated version which provides an interface for training AUGUSTUS
for predicting genes in genomes of novel species. It also enables you to
predict genes in a genome sequence with already trained parameters.
(Reference: K.J. Hoff & M. Stanke. 2013. Nucl.
Acids Res. 41(Web Server issue):W123-8.).
GENSCAN
GENSCAN (C. Burge, Massachusetts Institute of Technology, U.S.A.)
GenomeScan
GenomeScan (C. Burge, MIT, U.S.A.) - The newer version of GENSCAN this can be used to predict vertebrate, Arabidopsis & maize genes.
GeneMark
GeneMark
(Georgia Institute of Technology, U.S.A.) - For several species
pre-trained model parameters are ready and available through the
GeneMark.hmm
page. For metagenomic analysis use
MetaGeneMark
(Reference: Zhu, W. et al. 2010. Nucleic Acids
Research; 38: e132)
geneid
geneid (Genome Informatics Research Lab, Universitat Pompeu Fabra, Spain) - Prediction of human & Drosophila genes.
HMMgene
HMMgene (Anders Krogh, Center for Biological Sequence Analysis, Denmark) - Prediction of vertebrate and C. elegans genes.
SpliceAPP
SpliceAPP
- Splice-Alternative Profile Predictor is an interactive web server for
the prediction of RNA splicing in human, and a searchable database of
collected splicing variants carried out by our prediction tool.
(Reference: Huang AC et al (2024) BMC Genomics
volume 25, Article number: 600)
NNSPLICE
NNSPLICE
- Human and Drosophila Splice Site Prediction by Neural Network
(Reference: Reese MG et al (1997) J Comp Biol 4(3):
311-323).
SplicePort
SplicePort:
An Interactive Splice Site Analysis Tool - for splice-site analysis that
allows the user to make splice-site predictions for submitted sequences.
In addition, the user can also browse the rich catalog of features that
underlies these predictions, and which we have found capable of providing
high classification accuracy on human splice sites. Feature selection is
optimized for human splice sites, but the selected features are likely to
be predictive for other mammals as well. With our interactive feature
browsing and visualization tool, the user can view and explore subsets of
features used in splice-site prediction (either the features that account
for the classification of a specific input sequence or the complete
collection of features). Selected feature sets can be searched, ranked or
displayed easily. The user can group features into clusters and frequency
plot WebLogos can be generated.
(Reference: Dogan, R.I. et al. 2007. Nucl. Acids
Res. 35(Web Server issue): W285-W291).
SpliceRover
SpliceRover
- is a predictive deep learning approach that outperforms the
state-of-the-art in splice site prediction. SpliceRover uses
convolutional neural networks (CNNs), which have been shown to obtain
cutting edge performance on a wide variety of prediction tasks. We
adapted this approach to deal with genomic sequence inputs, and show it
consistently outperforms already existing approaches, with relative
improvements in prediction effectiveness of up to 80.9% when measured in
terms of false discovery rate. However, a major criticism of CNNs
concerns their 'black box' nature, as mechanisms to obtain insight into
their reasoning processes are limited. To facilitate interpretability of
the SpliceRover models, we introduce an approach to visualize the
biologically relevant information learnt.
(Reference: Zuallaert J et al. (2018)
Bioinformatics; 34(24): 4180-4188).
iSS-PC
iSS-PC (identifying splicing sites via
physical-chemical properties using deep sparse auto-encoder) - involves
twelve physical-chemical properties of the dinucleotides within DNA into
PseDNC to formulate given sequence samples via a battery of
cross-covariance and auto-covariance transformations.
(Reference: Chen W et al. Biomed Research
International 2014: 623149).
HSF 3.0
HSF 3.0 Human
SplicingFinder (Aix Marseille Université, France) - this system combines
12 different algorithms to identify and predict mutations' effect
onsplicing motifs including the acceptor and donor splice sites, the
branch point and auxiliary sequences known to either enhance or repress
splicing: ExonicSplicing Enhancers (ESE) and Exonic Splicing Silencers
(ESS). These algorithms are based on either PWM matrices, Maximum Entropy
principle or MotifComparison method. is a tool to predict the effects of
mutations on splicing signals or to identify splicing motifs in any human
sequence. It contains all available matrices for auxiliary sequence
prediction as well as new ones for binding sites of the 9G8 and
Tra2-beta Serine-Arginine proteins and the hnRNP A1 ribonucleoprotein.
We also developed new Position Weight Matrices to assess the strength of
5' and 3' splice sites and branch points.
(Reference: FO Desmet et al. 2009. Nucleic Acid
Research 37:e67).
ASSEDA
ASSEDA (Automated Splice Site and Exon Definition Analyses) - is a tool to predict the effects of sequence changes that alter mRNA splicing in human diseases. We designed the system to evaluate changes in splice site strength based on information theory-based models of donor and acceptor splice sites. N.B. You need to register.
NetGene2
NetGene2
- produces neural network predictions of splice sites in human, C.
elegans and A. thaliana DNA. Restrictions: at most one sequence not less
than 200 and not more than 100,000 nucleotides.
(Reference: S.M. Hebsgaard et al. 1996. Nucl. Acids
Res. 24:3439-3452).
If you want to express a gene in an organism having different codon usage:
JCat
JCat - Codon
Adapter Tool - offers a complete range of eukaryotic & prokaryotic cells;
and, the ability to select against rho-independent terminators and
restriction sites.
(Reference: A. Grote et al. 2005. Nucl. Acids Res.
33: W526-W531).
mRNAdesigner
mRNAdesigner -
is a web server specifically designed to optimize mRNA sequences to enhance protein expression levels in eukaryotic
expression systems. Users can input their coding sequence (CDS) of interest, and mRNAdesigner will iteratively optimize the
CDS using Monte Carlo tree search algorithms based on user-defined constraints (such as GC content, codon preference, maximum
allowed stem-loop length, etc.). The optimized CDS features a high codon adaptation index, moderate structure, and
user-defined GC ratio. Based on the optimized CDS, mRNAdesigner will identify a 5' UTR with high mean ribosome load (MRL) and
a 3' UTR with minimal regulatory elements to complement the CDS, forming a new mRNA sequence for downstream applications such
as antigen protein expression.
(Reference: Mo O et al. 2025. Nucleic Acids Research 53(W1): W415 - W426).
Updated: February, 2026