Protein Chemistry

Background Information: You might want to consult Robert Russell's Guide to Structure Prediction or Heinz Reiske's more recent. "A Beginner's Guide to Protein Structure Prediction." For the biochemical properties of amino acids see Amino Acid Hydrophobicity and Amino Acid Chart and Reference Table (GenScript). If you are specifically interested in antibodies I would recommend that you visit "The Antibody Resource Page."

Table of amino acid abbreviations


Amino acid composition, mass & pI:

ProtParam

Amino acid composition & Mass - ProtParam (ExPASy, Switzerland)


Protein Molecular Weight Calculator

Protein Molecular Weight Calculator - (Altogen Labs, Austin, Texas) - allows batch determination of protein molecular weights


Compute pI/Mw tool

Isoelectric Point - Compute pI/Mw tool (ExPASy, Switzerland). If you want a plot of the relationship between charge and pH use ProteinChemist (ProteinChemist.com)


Peptide Calculator

Peptide Calculator (Bachem AG, Bubendorf Switzerland) - calculates the moleceular weight, (Mr), average hydrophobility, isoelectic point, and charge at pH 7.0


PEPSTATS and Biochemistry-online

Mass, pI, composition and mol% acidic, basic, aromatic, polar etc. amino acids - PEPSTATS (EMBOSS). Biochemistry-online (Vitalonic, Russia) gives one % composition, molecular weight, pI, and charge at any desired pH.


Peptide Molecular Weight Calculator

Peptide Molecular Weight Calculator (GenScript) - the online calculator determines the chemical formula and molecular weight of your peptide of interest. You can also specify post-translational modifications, such as N- and C- terminal modifications and positioning of disulfide bridges, to obtain more accurate outputs.


Isoelectric Point Calculator 2.0

Isoelectric Point Calculator 2.0 (IPC 2.0) - is a server for the prediction of isoelectric points and pKa values using a mixture of deep learning and support vector regression models. The prediction accuracy (RMSD) of IPC 2.0 for proteins and peptides outperforms previous algorithms.
(Reference: Kozlowski LP (2022) Nucl. Acids Res. Web Server issue 50(Issue D1): D1535-D1540).


Composition/Molecular Weight Calculation

Composition/Molecular Weight Calculation (Georgetown University Medical Center, U.S.A.) - the only problem with this site is that when run in batch mode it does not identify the sequence by name, merely sequential number


Batch Protein Isoelectric Point determination

Batch Protein Isoelectric Point determination - part of the Sequence Manipulation Suite or ENDMEMO


Batch Protein Molecular Weight determination

Batch Protein Molecular Weight determination - part of the Sequence Manipulation Suite or ENDMEMO


Protein calculator

Protein calculator (C. Putnam, The Scripps Research Institute, U.S.A.) - calculates mass, pI, charge at a given pH, counts amino acid residues etc.


Tm Predictor

Tm Predictor (P.C. Lyu Lab., National Tsing-Hua University, Taiwan) - calculates the theoretical protein melting temperature.


Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility

Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility
(Reference: Raghava, G. P. S. 2001. Biotech Software and Internet Report 2:198-200).


Antigenicity and allergenicity:

a good place to start would be The Immune Epitope Database (IEDB)


AllerTOP, AlgPred and SDAP

Allergenicity servers: AllerTOP
(Reference: Dimitrov, I. et al. 2013. BMC Bioinformatics 14(Suppl 6): S4),
AlgPred - prediction of allergenic proteins and mapping of IgE epitopes
(Reference: Saha, S. and Raghava, G.P.S. 2006. Nucleic Acids Research 34: W202-W209.),
and SDAP - Structural Database of Allergenic Proteins
(Reference: Ivanciuc, O. et al. 2003. Nucleic Acids Res. 31: 359-362).


VIOLIN

VIOLIN - Vaccine Investigation and OnLine Information Network - allows easy curation, comparison and analysis of vaccine-related research data across various human pathogens VIOLIN is expected to become a centralized source of vaccine information and to provide investigators in basic and clinical sciences with curated data and bioinformatics tools for vaccine research and development. VBLAST: Customized BLAST Search for Vaccine Research allows various search strategies against against 77 genomes of 34 pathogens.
(Reference: He, Y. et al. 2014. Nucleic Acids Res. 42(Database issue): D1124-32).


SVMTriP

SVMTriP - is a new method to predict antigenic epitope with lastest sequence input from IEDB database. In our method, Support Vector Machine (SVM) has been utilized by combining the Tri-peptide similarity and Propensity scores (SVMTriP) in order to achieve the better prediction performance. Moreover, SVMTriP is capable of recognizing viral peptides from a human protein sequence background.
(Reference: Yao B et al. (2012) PLoS One 7(9): e45152).


AllergyPred

AllergyPred - is a web server that predicts both protein- and chemical-based allergens. Five different models take protein IDs, sequences, chemical IDs, and structures as inputs for predicting respective allergy endpoints.
(Reference: Kemmler E et al. 2025. Nucleic Acids Research 53(W1): W4 - W10).


Solubility and crystalizability:

EnzymeMiner

EnzymeMiner - offers automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. The solubility prediction employs the in-house SoluProt predictor developed using machine learning.
(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1): W104-W109).


ESPRESSO

ESPRESSO (EStimation of PRotein ExpreSsion and SOlubility) - is a sequence-based predictor for estimating protein expression and solubility for three different protein expression systems: in vivo Escherichia coli, Brevibacillus, and wheat germ cell-free.
(Reference: Hirose S, & Noguchi T. 2013. Proteomics. 13:1444-1456).


SABLE

SABLE - Accurate sequence-based prediction of relative Solvent AccessiBiLitiEs, secondary structures and transmembrane domains for proteins of unknown structure.
(Reference: Adamczak R et al. 2004. Proteins 56:753-767).


Protein–Sol

Protein–Sol - is a web server for predicting protein solubility. Using available data for Escherichia coli protein solubility in a cell-free expression system, 35 sequence-based properties are calculated. Feature weights are determined from separation of low and high solubility subsets. The model returns a predicted solubility and an indication of the features which deviate most from average values.
(Reference: Hebditch M et al. 2017. Bioinformatics 33(19): 3098–3100).


SOLUPROT

SOLUPROT - was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProt's accuracy of 58.5% and AUC of 0.62 exceeded those of a suite of alternative solubility prediction tools.
(Reference: Hon, J., et al. (2021) Bioinformatics 37 (1): 23-28).


CamSol

CamSol - for the rational design of protein variants with enhanced solubility. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the solubility of the target protein while maintaining its native state and biological activity.
(Reference: Sormanni P et al. (2015) J Molec Biol 427(2): 478-490). N.B. Requires registration.


SERp

Surface Entropy Reduction prediction (SERp) - this exploratory tool aims to aid identification of sites that are most suitable for mutation designed to enhance crystallizability by a Surface Entropy Reduction approach.
(Reference: Goldschmidt L. et al. 2007. Protein Science. 16:1569-1576)


CRYSTALP2 and PPCpred

CRYSTALP2 - for in-silico prediction of protein crystallization propensity.
(Reference: Kurgan L, et al. 2009. BMC Structural Biology 9: 50);
and, PPCpred - sequence-based prediction of propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material.
(Reference: M.J. Mizianty & L. Kurgan. 2011. Bioinformatics 27: i24-i33).


Antimicrobial peptides, vaccines and toxins:

APD3

APD3 (Antimicrobial Peptide Database)
(Reference: Wang, G., Li, X. & Wang, Z. (2016) Nucl. Acids Res. 44: D1087-D1093.)


T3SEdb and T3Enc

The Type III Secretion System (T3SS) is an essential mechanism for host-pathogen interaction in the infection process. The proteins secreted through the T3SSmachinery of many Gram-negative bacteria are known as T3SS effectors (T3SEs). These can either be localized subcellularly in the host, or be part of the needle tip of the T3SS that interacts directly with the host membrane to bring other effectors into the target cell. T3SEdb represents such an effort to assemble a comprehensive database of all experimentally determined and putative T3SEs into a web-accessible site. BLAST search is available.
(Reference: Wang Y et al. 2012. BMC Bioinformatics. 13: 66.).
T3Enc - is an encyclopedia on bacterial type III secretion systems is an update to the discovery of TSSS
(Reference: Hu Y et al. (2017) Environ. Microbiol. 19(10): 3879-3895).


Bastion3

Bastion3 - is a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. This program explores a wide range of features, from various types, trains single models based on these features, and finally integrates these models through ensemble learning. Specifically, we trained the models using a new gradient boosting machine, LightGBM, and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy.
(Reference: Wang J et al. (2019) Bioinformatics, 35(12): 2017-2028)


Effectidor II

Effectidor II - is a pan-genomic AI-based algorithm for the prediction of type III secretion system effectors in input bacterial genomes. Graphical and tabular results are displayed as output.
(Reference: Wagner N et al. (2025) Bioinformatics, 41(5): btaf272)


T3SEpp

T3SEpp - is a prediction pipeline which integrates the results of individual modules, resulting in high accuracy (i.e., ∼0.94) and >1-fold reduction in the false-positive rate compared to that of state-of-the-art software tools.
(Reference: Hui X et al. 2020. mSystems 5 (4): e00288-20).


Effective

Effective (University of Vienna, Austria & Technical University of Munich, Germany) - Bacterial protein secretion is the key virulence mechanism of symbiotic and pathogenic bacteria. Thereby effector proteins are transported from the bacterial cytosol into the extracellular medium or directly into the eukaryotic host cell. The Effective portal provides precalculated predictions on bacterial effectors in all publicly available pathogenic and symbiontic genomes as well as the possibility for the user to predict effectors in own protein sequence data.


Vaxign

Vaxign is the first web-based vaccine design system that predicts vaccine targets based on genome sequences using the strategy of reverse vaccinology. Predicted features in the Vaxign pipeline include protein subcellular location, transmembrane helices, adhesin probability, conservation to human and/or mouse proteins, sequence exclusion from genome(s) of nonpathogenic strain(s), and epitope binding to MHC class I and class II. The precomputed Vaxign database contains prediction of vaccine targets for >350 genomes.
(Reference: He Y et al. 2010. J Biomed Biotechnol. 2010: 297505).


VacTarBac

VacTarBac is a platform which stores vaccine candidate against several pathogenic bacteria. The vaccine are designed on the basis of their probabilty to act as epitope, thus have the potential to induce any of the several arm of immune system. These epitopes have been predicted against the virulence factor and essentail genes of 14 bacterial species.
(Reference: Nagpal G et al. (2018) Front Immunol. 9: 2280).


Abpred

Abpred - will take a single amino acid sequence for a Fv and calculate the predicted performance on 12 biophysical platforms
(Reference: Hebditch M & J Warwicker (2019) PeerJ. 7: e8199).


Victors

Victors - is a database comprised of genes experimentally observed to be necessary for virulence. Included are virulence factors for many different bacteria, viruses, parasites and fungi, which are pathogenic to animals and humans. Within Victors are virulence factors, as well as corresponding sequence information taken from NCBI when available. LPS and capsule structures are also included as virulence factors, but do not have attached sequence information as they are tertiary gene products and therefore do not have singular sequence data available. Has a BLAST interface
(Reference: Sayers S et al.(2019) 47(D1): D693-D700)


Circular dichroism:

DICHROWEB

Circular Dichroism (Birkbeck College, School of Crystalography, England) DICHROWEB is an interactive web site which allows the deconvolution of data from Circular Dichroism spectroscopy experiments. It offers an interface to a range of deconvolution algorithms (CONTINLL, SELCON3, CDSSTR, VARSLC, K2D).


K2D2

K2D2: Prediction of percentages of protein secondary structure from CD spectra - allows analysis of 41 CD spectrum data points ranging from 200 nm to 240 nm or or 51 data points for the 190-240 nm range
(Reference: Perez-Iratxeta C & Andrade-Navarro MA. 2008. BMC Structural Biology 2008, 8:25)


K2D3

K2D3 is a web server to estimate the a helix and ß strand content of a protein from its circular dichroism spectrum. K2D3 uses a database of theoretical spectra derived with Dichrocalc
(Reference: Louis-Jeune C et al. 2012. Proteins: Structure, Function, & Bioinformatics 80: 374-381)


Hydrophobicity Plotter and Protein Hydroplotter

Hydrophobicity Plotter (Innovagen ) - and Protein Hydroplotter - sellect under Tools (ProteinLounge, San Diego, CA ).


ChiraKit

ChiraKit - - features include the calculation of protein secondary structure with the SELCON3 and SESCA algorithms, estimation of peptide helicity using the helix-ensemble model, the fitting of thermal/chemical unfolding or user-defined models, and the decomposition of spectra through singular value decomposition or principal component analysis.
(Reference: Burastero O et al. 2025. Nucleic Acids Research 53(W1): W158 - W168).


BeStSel

BeStSel - (Beta Structure Selection) - the main problem of protein CD spectroscopy is the spectral variability of β-structures. This web server provide tools to the community for CD spectrum analysis. BeStSel uniquely provides information on eight secondary structure components, including parallel β-structure and antiparallel β-sheets with three different twist groups. It outperforms all available methods in accuracy and information content, and is also able to predict protein folds down to the topology/homology level of the CATH classification.
(Reference: Micsonai A et al. 2025. Nucleic Acids Research 53(W1): W73 - W83).


Proteolysis and Mass Spectrometry:

An excellent proteomic resource is the Rokefeller UniversityProteomics Resource Center's Useful Links.


PeptideCutter

Proteolysis - PeptideCutter (ExPASy, Switzerland) which also predicts cleavage sites for enzymes and chemicals.


FindMod and GlycoMod

For more sophisticated protein analysis involving mass spectroscopy ExPasy has introduced FindMod to predict potential protein post-translational modifications in peptides; and, GlycoMod which can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses.


ProteinProspector

ProteinProspector (Dr. Alma Burlingame, University of California) - offers a wide variety of tools (e.g. MS-Fit, MS-Tag, MS-Seq, MS-Pattern, MS-Homology) for the protein mass spectroscopist.


Repeats:

Radar and REPRO

Repeats in protein sequences can be discovered using Radar (Rapid Automatic Detection and Alignment of Repeats, European Bioinformatics Institute) or REPRO
(Reference: George RA. & Heringa J. 2000. Trends Biochem. Sci. 25: 515-517).


REP2

REP2 - is a web server to detect common tandem repeats in protein sequences.
(Reference: Kamel M, et al. (2021) J. Molec. Biol. 433(11):166895).


Two-dimensional gels:

JVirGel

JVirGel calculation of virtual two-dimensional protein gels - creates virtual 2D proteomes from a huge list of eukaryotes & prokaryotes (or an individual protein).
(Reference: K. Hiller et al. 2003. Nucl. Acids Res. 31: 3862-3865).


Virtual Two-Dimensional Protein Gels

Draw Virtual Two-Dimensional Protein Gels (PRODORIC Net, Germany) - using your own protein sequence data or for different organisms. Also see Proteome-pI which is a database of pre-computed isoelectric points and molecular weights for proteins and digest peptides from model organism proteomes


Metasite:

Scratch Protein Predictor

Scratch Protein Predictor - (Institute for Genomics and Bioinformatics, University California, Irvine) - programs include: ACCpro: the relative solvent accessibility of protein residues; CMAPpro: Prediction of amino acid contact maps; COBEpro: Prediction of continuous B-cell epitopes; CONpro: predicts whether the number of contacts of each residue in a protein is above or below the average for that residue; DIpro: Prediction of disulphide bridges; DISpro: Prediction of disordered regions; DOMpro: Prediction of domains; SSpro: Prediction of protein secondary structure; SVMcon: Prediction of amino acid contact maps using Support Vector Machines; and, 3Dpro: Prediction of protein tertiary structure (Ab Initio).


Mutagenesis:

Gene Mutagenesis Designer

Gene Mutagenesis Designer (GenScript) is developed to make your design of point DNA mutagenesis straightforward to facilitate gene mutation. To perform DNA mutagenesis from wild type, simply input your starting sequence of wild type gene into the field below, and then click on the "from selection" button to select the amino acid(s) of interest. Consequently, the new gene sequence encoding mutated protein will be generated upon a click "submit". You can select a number of expression systems.


I-Mutant

I-Mutant - predicts protein stability changes upon mutation - choose either a PDB reference number or paste your own protein. The answer (by email) indicates whether the protein is more or less stable, a fact which could be of use in designing "better" proteins.
(Reference: E. Capriotti et al. 2005. Nucl. Acids Res. 33: W306-W310).


SIFT

SIFT - The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function i.e. it predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids. SIFT can be applied to naturally occurring nonsynonymous polymorphisms and laboratory-induced missense mutations.
(Reference: N-L Sim et al. 2012. Nucleic Acids Research; 40(1): W452-W457).


mCSM-membrane

mCSM-membrane - predicts the effects of mutations on transmembrane proteins.
(Reference: Pires DEV et al. 2020. Nucl Acids Res 48 (W1): W147-W153).


EnzymeMiner

EnzymeMiner - allows automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. The solubility prediction employs the in-house SoluProt predictor developed using machine learning.
(Reference: Hon J et al. 2020. Nucl Acids Res 48 (W1): W104-W109).


PlaToLoCo

PlaToLoCo (PLAtform of TOols for LOw COmplexity) - is the first web meta-server for visualization and annotation of low complexity regions in proteins which employs five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies.
(Reference: Jarnot P et al. 2020. Nucl Acids Res 48 (W1): W77-W84).

Updated: February, 2026