Protein Chemistry
Background Information: You might want to consult Robert Russell's Guide to Structure Prediction or Heinz Reiske's more recent. "A Beginner's Guide to Protein Structure Prediction." For the biochemical properties of amino acids see Amino Acid Hydrophobicity and Amino Acid Chart and Reference Table (GenScript). If you are specifically interested in antibodies I would recommend that you visit "The Antibody Resource Page."
Table of amino acid abbreviations
Amino acid composition, mass & pI:
ProtParam
Amino acid composition & Mass - ProtParam (ExPASy, Switzerland)
Protein Molecular Weight Calculator
Protein Molecular Weight Calculator - (Altogen Labs, Austin, Texas) - allows batch determination of protein molecular weights
Compute pI/Mw tool
Isoelectric Point - Compute pI/Mw tool (ExPASy, Switzerland). If you want a plot of the relationship between charge and pH use ProteinChemist (ProteinChemist.com)
Peptide Calculator
Peptide Calculator (Bachem AG, Bubendorf Switzerland) - calculates the moleceular weight, (Mr), average hydrophobility, isoelectic point, and charge at pH 7.0
PEPSTATS and Biochemistry-online
Mass, pI, composition and mol% acidic, basic, aromatic, polar etc. amino acids - PEPSTATS (EMBOSS). Biochemistry-online (Vitalonic, Russia) gives one % composition, molecular weight, pI, and charge at any desired pH.
Peptide Molecular Weight Calculator
Peptide Molecular Weight Calculator (GenScript) - the online calculator determines the chemical formula and molecular weight of your peptide of interest. You can also specify post-translational modifications, such as N- and C- terminal modifications and positioning of disulfide bridges, to obtain more accurate outputs.
Isoelectric Point Calculator 2.0
Isoelectric Point Calculator 2.0
(IPC 2.0) - is a server for the prediction of isoelectric points and pKa
values using a mixture of deep learning and support vector regression
models. The prediction accuracy (RMSD) of IPC 2.0 for proteins and
peptides outperforms previous algorithms.
(Reference: Kozlowski LP (2022) Nucl. Acids Res. Web
Server issue 50(Issue D1): D1535-D1540).
Composition/Molecular Weight Calculation
Composition/Molecular Weight Calculation (Georgetown University Medical Center, U.S.A.) - the only problem with this site is that when run in batch mode it does not identify the sequence by name, merely sequential number
Batch Protein Isoelectric Point determination
Batch Protein Isoelectric Point determination - part of the Sequence Manipulation Suite or ENDMEMO
Batch Protein Molecular Weight determination
Batch Protein Molecular Weight determination - part of the Sequence Manipulation Suite or ENDMEMO
Protein calculator
Protein calculator (C. Putnam, The Scripps Research Institute, U.S.A.) - calculates mass, pI, charge at a given pH, counts amino acid residues etc.
Tm Predictor
Tm Predictor (P.C. Lyu Lab., National Tsing-Hua University, Taiwan) - calculates the theoretical protein melting temperature.
Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility
Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility
(Reference: Raghava, G. P. S. 2001. Biotech Software
and Internet Report 2:198-200).
Antigenicity and allergenicity:
a good place to start would be The Immune Epitope Database (IEDB)
AllerTOP, AlgPred and SDAP
Allergenicity servers:
AllerTOP
(Reference: Dimitrov, I. et al. 2013. BMC
Bioinformatics 14(Suppl 6): S4),
AlgPred
- prediction of allergenic proteins and mapping of IgE epitopes
(Reference: Saha, S. and Raghava, G.P.S. 2006.
Nucleic Acids Research 34: W202-W209.),
and SDAP - Structural Database
of Allergenic Proteins
(Reference: Ivanciuc, O. et al. 2003. Nucleic Acids
Res. 31: 359-362).
VIOLIN
VIOLIN
- Vaccine Investigation and OnLine Information Network - allows easy
curation, comparison and analysis of vaccine-related research data across
various human pathogens VIOLIN is expected to become a centralized source
of vaccine information and to provide investigators in basic and clinical
sciences with curated data and bioinformatics tools for vaccine research
and development. VBLAST: Customized BLAST Search for Vaccine Research
allows various search strategies against against 77 genomes of 34
pathogens.
(Reference: He, Y. et al. 2014. Nucleic Acids Res.
42(Database issue): D1124-32).
SVMTriP
SVMTriP
- is a new method to predict antigenic epitope with lastest sequence input
from IEDB database. In our method, Support Vector Machine (SVM) has been
utilized by combining the Tri-peptide similarity and Propensity scores
(SVMTriP) in order to achieve the better prediction performance. Moreover,
SVMTriP is capable of recognizing viral peptides from a human protein
sequence background.
(Reference: Yao B et al. (2012) PLoS One 7(9):
e45152).
AllergyPred
AllergyPred -
is a web server that predicts both protein- and chemical-based allergens. Five different models take protein IDs, sequences,
chemical IDs, and structures as inputs for predicting respective allergy endpoints.
(Reference: Kemmler E et al. 2025. Nucleic Acids Research 53(W1): W4 - W10).
Solubility and crystalizability:
EnzymeMiner
EnzymeMiner
- offers automated mining of soluble enzymes with diverse structures,
catalytic properties and stabilities. The solubility prediction employs the
in-house SoluProt predictor developed using machine learning.
(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1):
W104-W109).
ESPRESSO
ESPRESSO
(EStimation of PRotein ExpreSsion and SOlubility) - is a sequence-based
predictor for estimating protein expression and solubility for three
different protein expression systems: in vivo Escherichia coli,
Brevibacillus, and wheat germ cell-free.
(Reference: Hirose S, & Noguchi T. 2013. Proteomics.
13:1444-1456).
SABLE
SABLE
- Accurate sequence-based prediction of relative Solvent AccessiBiLitiEs,
secondary structures and transmembrane domains for proteins of unknown
structure.
(Reference: Adamczak R et al. 2004. Proteins
56:753-767).
Protein–Sol
Protein–Sol
- is a web server for predicting protein solubility. Using available data
for Escherichia coli protein solubility in a cell-free expression system,
35 sequence-based properties are calculated. Feature weights are determined
from separation of low and high solubility subsets. The model returns a
predicted solubility and an indication of the features which deviate most
from average values.
(Reference: Hebditch M et al. 2017. Bioinformatics
33(19): 3098–3100).
SOLUPROT
SOLUPROT
- was created using the gradient boosting machine technique with the
TargetTrack database as a training set. When evaluated against a balanced
independent test set derived from the NESG database, SoluProt's accuracy of
58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility
prediction tools.
(Reference: Hon, J., et al. (2021) Bioinformatics 37
(1): 23-28).
CamSol
CamSol
- for the rational design of protein variants with enhanced solubility. The
method works by performing a rapid computational screening of tens of
thousand of mutations to identify those with the greatest impact on the
solubility of the target protein while maintaining its native state and
biological activity.
(Reference: Sormanni P et al. (2015) J Molec Biol
427(2): 478-490). N.B. Requires registration.
SERp
Surface Entropy Reduction prediction
(SERp) - this exploratory
tool aims to aid identification of sites that are most suitable for
mutation designed to enhance crystallizability by a Surface Entropy
Reduction approach.
(Reference: Goldschmidt L. et al. 2007. Protein
Science. 16:1569-1576)
CRYSTALP2 and PPCpred
CRYSTALP2
- for in-silico prediction of protein crystallization propensity.
(Reference: Kurgan L, et al. 2009. BMC Structural
Biology 9: 50);
and,
PPCpred
- sequence-based prediction of propensity for production of
diffraction-quality crystals, production of crystals, purification and
production of the protein material.
(Reference: M.J. Mizianty & L. Kurgan. 2011.
Bioinformatics 27: i24-i33).
Antimicrobial peptides, vaccines and toxins:
APD3
APD3
(Antimicrobial Peptide Database)
(Reference: Wang, G., Li, X. & Wang, Z. (2016) Nucl.
Acids Res. 44: D1087-D1093.)
T3SEdb and T3Enc
The Type III Secretion System (T3SS) is an essential mechanism for
host-pathogen interaction in the infection process. The proteins secreted
through the T3SSmachinery of many Gram-negative bacteria are known as T3SS
effectors (T3SEs). These can either be localized subcellularly in the host,
or be part of the needle tip of the T3SS that interacts directly with the
host membrane to bring other effectors into the target cell.
T3SEdb
represents such an effort to assemble a comprehensive database of all
experimentally determined and putative T3SEs into a web-accessible site.
BLAST search is available.
(Reference: Wang Y et al. 2012. BMC Bioinformatics.
13: 66.).
T3Enc
- is an encyclopedia on bacterial type III secretion systems is an update
to the discovery of TSSS
(Reference: Hu Y et al. (2017) Environ. Microbiol.
19(10): 3879-3895).
Bastion3
Bastion3
- is a two-layer ensemble predictor developed to accurately identify type
III secreted effectors from protein sequence data. This program explores a
wide range of features, from various types, trains single models based on
these features, and finally integrates these models through ensemble
learning. Specifically, we trained the models using a new gradient boosting
machine, LightGBM, and further boosted the models' performances through a
novel genetic algorithm (GA) based two-step parameter optimization
strategy.
(Reference: Wang J et al. (2019) Bioinformatics,
35(12): 2017-2028)
Effectidor II
Effectidor II
- is a pan-genomic AI-based algorithm for the prediction of type III
secretion system effectors in input bacterial genomes. Graphical and
tabular results are displayed as output.
(Reference: Wagner N et al. (2025) Bioinformatics,
41(5): btaf272)
T3SEpp
T3SEpp
- is a prediction pipeline which integrates the results of individual
modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in
the false-positive rate compared to that of state-of-the-art software
tools.
(Reference: Hui X et al. 2020. mSystems 5 (4):
e00288-20).
Effective
Effective (University of Vienna, Austria & Technical University of Munich, Germany) - Bacterial protein secretion is the key virulence mechanism of symbiotic and pathogenic bacteria. Thereby effector proteins are transported from the bacterial cytosol into the extracellular medium or directly into the eukaryotic host cell. The Effective portal provides precalculated predictions on bacterial effectors in all publicly available pathogenic and symbiontic genomes as well as the possibility for the user to predict effectors in own protein sequence data.
Vaxign
Vaxign
is the first web-based vaccine design system that predicts vaccine targets
based on genome sequences using the strategy of reverse vaccinology.
Predicted features in the Vaxign pipeline include protein subcellular
location, transmembrane helices, adhesin probability, conservation to human
and/or mouse proteins, sequence exclusion from genome(s) of nonpathogenic
strain(s), and epitope binding to MHC class I and class II. The precomputed
Vaxign database contains prediction of vaccine targets for >350 genomes.
(Reference: He Y et al. 2010. J Biomed Biotechnol.
2010: 297505).
VacTarBac
VacTarBac
is a platform which stores vaccine candidate against several pathogenic
bacteria. The vaccine are designed on the basis of their probabilty to act
as epitope, thus have the potential to induce any of the several arm of
immune system. These epitopes have been predicted against the virulence
factor and essentail genes of 14 bacterial species.
(Reference: Nagpal G et al. (2018) Front Immunol. 9:
2280).
Abpred
Abpred
- will take a single amino acid sequence for a Fv and calculate the
predicted performance on 12 biophysical platforms
(Reference: Hebditch M & J Warwicker (2019) PeerJ. 7:
e8199).
Victors
Victors
- is a database comprised of genes experimentally observed to be necessary
for virulence. Included are virulence factors for many different bacteria,
viruses, parasites and fungi, which are pathogenic to animals and humans.
Within Victors are virulence factors, as well as corresponding sequence
information taken from NCBI when available. LPS and capsule structures are
also included as virulence factors, but do not have attached sequence
information as they are tertiary gene products and therefore do not have
singular sequence data available. Has a BLAST interface
(Reference: Sayers S et al.(2019) 47(D1): D693-D700)
Circular dichroism:
DICHROWEB
Circular Dichroism (Birkbeck College, School of Crystalography, England) DICHROWEB is an interactive web site which allows the deconvolution of data from Circular Dichroism spectroscopy experiments. It offers an interface to a range of deconvolution algorithms (CONTINLL, SELCON3, CDSSTR, VARSLC, K2D).
K2D2
K2D2:
Prediction of percentages of protein secondary structure from CD spectra -
allows analysis of 41 CD spectrum data points ranging from 200 nm to 240 nm
or or 51 data points for the 190-240 nm range
(Reference: Perez-Iratxeta C & Andrade-Navarro MA.
2008. BMC Structural Biology 2008, 8:25)
K2D3
K2D3
is a web server to estimate the a helix and ß strand content of a protein
from its circular dichroism spectrum. K2D3 uses a database of theoretical
spectra derived with Dichrocalc
(Reference: Louis-Jeune C et al. 2012. Proteins:
Structure, Function, & Bioinformatics 80: 374-381)
Hydrophobicity Plotter and Protein Hydroplotter
Hydrophobicity Plotter (Innovagen ) - and Protein Hydroplotter - sellect under Tools (ProteinLounge, San Diego, CA ).
ChiraKit
ChiraKit -
- features include the calculation of protein secondary structure with the SELCON3 and SESCA algorithms, estimation of
peptide helicity using the helix-ensemble model, the fitting of thermal/chemical unfolding or user-defined models, and the
decomposition of spectra through singular value decomposition or principal component analysis.
(Reference: Burastero O et al. 2025. Nucleic Acids Research 53(W1): W158 - W168).
BeStSel
BeStSel -
(Beta Structure Selection) - the main problem of protein CD spectroscopy is the spectral variability of β-structures. This
web server provide tools to the community for CD spectrum analysis. BeStSel uniquely provides information on eight secondary
structure components, including parallel β-structure and antiparallel β-sheets with three different twist groups. It
outperforms all available methods in accuracy and information content, and is also able to predict protein folds down to the
topology/homology level of the CATH classification.
(Reference: Micsonai A et al. 2025. Nucleic Acids Research 53(W1): W73 - W83).
Proteolysis and Mass Spectrometry:
An excellent proteomic resource is the Rokefeller UniversityProteomics Resource Center's Useful Links.
PeptideCutter
Proteolysis - PeptideCutter (ExPASy, Switzerland) which also predicts cleavage sites for enzymes and chemicals.
FindMod and GlycoMod
For more sophisticated protein analysis involving mass spectroscopy ExPasy has introduced FindMod to predict potential protein post-translational modifications in peptides; and, GlycoMod which can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses.
ProteinProspector
ProteinProspector (Dr. Alma Burlingame, University of California) - offers a wide variety of tools (e.g. MS-Fit, MS-Tag, MS-Seq, MS-Pattern, MS-Homology) for the protein mass spectroscopist.
Repeats:
Radar and REPRO
Repeats in protein sequences can be discovered using
Radar
(Rapid Automatic Detection and Alignment of Repeats, European
Bioinformatics Institute) or
REPRO
(Reference: George RA. & Heringa J. 2000. Trends
Biochem. Sci. 25: 515-517).
REP2
REP2
- is a web server to detect common tandem repeats in protein sequences.
(Reference: Kamel M, et al. (2021) J. Molec. Biol.
433(11):166895).
Two-dimensional gels:
JVirGel
JVirGel
calculation of virtual two-dimensional protein gels - creates virtual 2D
proteomes from a huge list of eukaryotes & prokaryotes (or an individual
protein).
(Reference: K. Hiller et al. 2003. Nucl. Acids Res.
31: 3862-3865).
Virtual Two-Dimensional Protein Gels
Draw Virtual Two-Dimensional Protein Gels (PRODORIC Net, Germany) - using your own protein sequence data or for different organisms. Also see Proteome-pI which is a database of pre-computed isoelectric points and molecular weights for proteins and digest peptides from model organism proteomes
Metasite:
Scratch Protein Predictor
Scratch Protein Predictor - (Institute for Genomics and Bioinformatics, University California, Irvine) - programs include: ACCpro: the relative solvent accessibility of protein residues; CMAPpro: Prediction of amino acid contact maps; COBEpro: Prediction of continuous B-cell epitopes; CONpro: predicts whether the number of contacts of each residue in a protein is above or below the average for that residue; DIpro: Prediction of disulphide bridges; DISpro: Prediction of disordered regions; DOMpro: Prediction of domains; SSpro: Prediction of protein secondary structure; SVMcon: Prediction of amino acid contact maps using Support Vector Machines; and, 3Dpro: Prediction of protein tertiary structure (Ab Initio).
Mutagenesis:
Gene Mutagenesis Designer
Gene Mutagenesis Designer (GenScript) is developed to make your design of point DNA mutagenesis straightforward to facilitate gene mutation. To perform DNA mutagenesis from wild type, simply input your starting sequence of wild type gene into the field below, and then click on the "from selection" button to select the amino acid(s) of interest. Consequently, the new gene sequence encoding mutated protein will be generated upon a click "submit". You can select a number of expression systems.
I-Mutant
I-Mutant
- predicts protein stability changes upon mutation - choose either a PDB
reference number or paste your own protein. The answer (by email) indicates
whether the protein is more or less stable, a fact which could be of use in
designing "better" proteins.
(Reference: E. Capriotti et al. 2005. Nucl. Acids Res.
33: W306-W310).
SIFT
SIFT
- The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect
of coding variants on protein function i.e. it predicts whether an amino
acid substitution affects protein function based on sequence homology and
the physical properties of amino acids. SIFT can be applied to naturally
occurring nonsynonymous polymorphisms and laboratory-induced missense
mutations.
(Reference: N-L Sim et al. 2012. Nucleic Acids
Research; 40(1): W452-W457).
mCSM-membrane
mCSM-membrane
- predicts the effects of mutations on transmembrane proteins.
(Reference: Pires DEV et al. 2020. Nucl Acids Res 48
(W1): W147-W153).
EnzymeMiner
EnzymeMiner
- allows automated mining of soluble enzymes with diverse structures,
catalytic properties and stabilities. The solubility prediction employs the
in-house SoluProt predictor developed using machine learning.
(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1):
W104-W109).
PlaToLoCo
PlaToLoCo
(PLAtform of TOols for LOw COmplexity) - is the first web meta-server for
visualization and annotation of low complexity regions in proteins which
employs five different state-of-the-art tools for discovering LCRs and
provides functional annotations such as domain detection, transmembrane
segment prediction, and calculation of amino acid frequencies.
(Reference: Jarnot P et al. 2020. Nucl Acids Res 48
(W1): W77-W84).
Updated: February, 2026