PROTEIN CHEMISTRY

BACKGROUND INFORMATION: You might want to consult Robert Russell's Guide to Structure Prediction.   For the biochemical properties of amino acids see PROWL, Amino Acid Hydrophobicity and Aminosauren. If you  are specifically interested in antibodies I would recommend that you visit "The Antibody Resource Page."

Table of amino acid abbreviations

Amino acid composition, mass & pI:

 Amino acid composition & Mass ProtParam tool (ExPASy, Switzerland)
Isoelectric Point - Compute pI/Mw tool (ExPASy, Switzerland). If you want a plot of the relationship between charge and pH use ProteinChemist (ProteinChemist.com) or JVirGel Proteomic Tools (PRODORIC Net, Germany).  
Mass, pI, composition and mol% acidic, basic, aromatic, polar etc. amino acids - PEPSTATS (EMBOSS). Biochemistry-online (Vitalonic, Russia) gives one % composition, molecular weight, pI, and charge at any desired pH.

 IPC Isoelectric point calculator (Lukasz P. Kozlowski) provides detailed analysis of isoelectric point according different scales for individual proteins, plus the average; together with the number of residues and mass.  

 Composition/Molecular Weight Calculation (Georgetown University Medical Center, U.S.A.) - the only problem with this site is that when run in batch mode it  does not identify the sequence by name, merely sequential number

 Batch Protein Isoelectric Point determination - part of the Sequence Manipulation Suite

 Batch Protein Molecular Weight determination - part of the Sequence Manipulation Suite

Protein calculator ( C. Putnam, The Scripps Research Institute, U.S.A.) - calculates mass, pI, charge at a given pH, counts amino acid residues etc.

 Computation of size of DNA and Protein Fragments from Their Electrophoretic Mobility (Reference: Raghava, G. P. S. 2001. Biotech Software and Internet Report 2:198-200).

Antigenicity and allergenicity:

  Abie Pro Peptide Antibody Design (Chang Bioscience)

  Allergenicity servers: AllerTOP (Reference: Dimitrov, I. et al. 2013. BMC Bioinformatics 14(Suppl 6):  S4), AlgPred - prediction of allergenic proteins and mapping of IgE epitopes (Reference: Saha, S. and Raghava, G.P.S.   2006.  Nucleic Acids Research 34: W202-W209.), and  SDAP - Structural Database of Allergenic Proteins (Reference: Ivanciuc, O. et al. 2003. Nucleic Acids Res. 31: 359-362).

  SBS EpiToolKit   -  provides a collection of methods from computational immunology for the prediction of MHC ligands or potential T-Cell epitopes. Additionally, SNEPv2 extends epitope prediction by the possibility to analyze the influence of protein polymorphisms on the immunogenicity of the arising polymorphic peptides. (Reference: N.C. Toussaint & O. Kohlbacher. 2009. Nucl. Acids Res. 37 (Web server issue): W617-W622 )

    VIOLIN : Vaccine Investigation and OnLine Information N etwork - allows easy curation, comparison and analysis of vaccine-related research data across various human pathogens VIOLIN is expected to become a centralized source of vaccine information and to provide investigators in basic and clinical sciences with curated data and bioinformatics tools for vaccine research and development. VBLAST: Customized BLAST Search for Vaccine Research allows various search strategies against against 77 genomes of 34 pathogens. (Reference: He, Y. et al. 2014. Nucleic Acids Res. 42(Database issue):D1124-32).

Solubility and crystalizability:

 PROSO and PROSO II - are sequence-based PRO tein SOlubility evaluators which try to answer the following question: "Which of my cloned proteins have the best/worst chances to be soluble upon heterologous expression?" (Reference: Smialowski P et al. 2007. Bioinformatics 23:2536-2542 & Smialowski P et al. 2012.  FEBS J. 279: 2192-2200)

 ESPRESSO (EStimation of PRotein ExpreS sion and SOlubility) - is a sequence-based predictor for estimating protein expression and solubility for three different protein expression systems: in vivo Escherichia coli, Brevibacillus, and wheat germ cell-free. (Reference: Hirose S, & Noguchi T. 2013. Proteomics. 13:1444-1456).

 SABLE - Accurate sequence-based prediction of relative Solvent AccessiBiLitiEs,secondary structures and transmembrane domains for proteins of unknown structure. (Reference:  Adamczak R et al. 2004.  Proteins 56:753-767). 

 SPpred (Soluble P rotein prediction) (Bioinformatics Center, Institute of Microbial Technology, Chandigarh, India) - is a web-server for predicting solubility of a protein on over expression in E.coli. The prediction is done by hybrid of SVM model trained on PSSM profile generated by PSI-BLAST search of 'nr' protein database and splitted amino acid composition.

 SECRET - is a SEquence-based CRystallizability EvaluaT or which tries to answer the following questions:"What is the chance that my soluble  protein will crystallize?" & "Which of my soluble proteins have the best/worst chances to crystallize?" (Reference: Smialowski P et al. 2006. Proteins 62: 343-355).

 Surface Entropy Reduction p rediction (SERp) - this exploratory tool aims to aid identification of sites that are most suitable for mutation designed to enhance crystallizability by a Surface Entropy Reduction approach. (Reference: Goldschmidt L. et al. 2007. Protein Science. 16:1569-1576)

 CRYSTALP2 - for in-silico prediction of protein crystallization propensity. (Reference: Kurgan L, et al. 2009. BMC Structural Biology 9: 50); and, PPCpred - sequence-based prediction of propensity for production of diffraction-quality crystals, production of crystals, purification and production of the protein material.(Reference: M.J. Mizianty & L. Kurgan. 2011. Bioinformatics 27: i24-i33).

Antimicrobial peptides, vaccines and toxins:

 APD: Antimicrobial Peptide D atabase. (Reference: Wang, Z. and Wang, G . 2004. Nucl. Acids Res.32: D590-D592 )

 Jenner Predict: Prediction of Protein Vaccine Candidates - submit your own sequence or select from a huge array of bacterial genomes.

 VirulentPred -  is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. (Reference: Garg A & Gupta G. 2008. BMC Bioinformatics 9: 62).

 The Type III Secretion System (T3SS) is an essential mechanism for host-pathogen interaction in the infection process. The proteins secreted through the T3SSmachinery of many Gram-negative bacteria are known as T3SS effectors (T3SEs). These can either be localized subcellularly in the host, or be part of the needle tip of the T3SS that interacts directly with the host membrane to bring other effectors into the  target cell. T3SEdb represents such an effort to assemble a comprehensive database of all experimentally determined and putative T3SEs into a web-accessible site. BLAST search is available. (Reference: Tay DM et al. 2010. BMC Bioinformatics. 11 Suppl 7:S4).

 Effective (University of Vienna, Austria & Technical University of Munich, Germany) - Bacterial protein secretion is the key virulence mechanism of symbiotic and pathogenic bacteria. Thereby effector proteins are transported from the bacterial cytosol into the extracellular medium or directly into the eukaryotic host cell. The Effective portal provides precalculated predictions on bacterial effectors in all publicly available pathogenic and symbiontic genomes as well as the possibility for the user to predict effectors in own protein sequence data.

 T3SS - Type III secretion system effector prediction (Reference: Löwer M, & Schneider G. 2009. PLoS One. 4:e5917. Erratum in: PLoS One. 2009;4(7).

 SIEVE Server is a public web tool for prediction of type III secreted effectors. The SIEVE Server scores potential secreted effectors from genomes of bacterial pathogens with type III secretion systems using a model learned from known secreted proteins. The SIEVE Server requires only protein sequences of proteins to be screened and returns a conservative probability that each input protein is a type III secreted effector. (Reference:  McDermott JE et al. 2011. Infect Immun. 79:23-32).

Circular dichroism:

Circular Dichroism (Birkbeck College, School of Crystalography, England) DICHROWEB is an interactive web site which allows the deconvolution of data from Circular Dichroism spectroscopy experiments. It offers an interface to a range of deconvolution algorithms (CONTINLL, SELCON3, CDSSTR, VARSLC, K2D).

K2D2: Prediction of percentages of protein secondary structure from CD spectra - allows analysis of 41 CD spectrum data points ranging from 200 nm to 240 nm or  or 51 data points for the 190-240 nm range (Reference: Perez-Iratxeta C  & Andrade-Navarro MA. 2008. BMC Structural Biology 2008, 8:25)

 K2D3 is a web server to estimate the a helix and ß strand content of a protein from its circular dichroism spectrum. K2D3 uses a database of theoretical spectra derived with Dichrocalc (Reference: Louis-Jeune C et al. 2012. Proteins: Structure, Function, & Bioinformatics 80: 374–381)

Cysteine Residues:

DiANNA - will predict cysteine oxidation state (76% accuracy),  cysteine pairs (81% accuracy) and disulfide bond connectivity (86% accuracy). (Reference: F. Ferrè & P. Clote. 2005. Nucl. Acids Res.  33: W230-W232).

CYSREDOX (Rockefeller University, U.S.A.) and CYSPRED (CIRB Biocomputing Group, University of Bologna, Italy) calculate the redox state of cysteine residues in proteins.

red_bullet.gif (914 bytes) Hydrophobicity Plotter (Innovagen ) - and Protein Hydroplotter - sellect under Tools (ProteinLounge, San Diego, CA ) . 

Proteolysis and Mass Spectrometry:

Proteolysis -  PeptideCutter   (ExPASy, Switzerland) which also predicts cleavage sites for enzymes and chemicals. An alternative proteolysis site is Mobility_plot 4.1 (Advanced Proteolytic Fingerprinting, IGH, France).  
For more sophisticated protein analysis involving mass spectroscopy ExPasy has introduced FindMod tool to predict potential protein post-translational modifications in peptides; and, GlycoMod Tool which can predict the possible oligosaccharide structures that occur on proteins from their experimentally determined masses.

ProFound - is a tool for searching a protein sequence database using information from mass spectra of peptide maps. A Bayesian algorithm is used to rank the protein sequences in the database according to their probability of producing the peptide map. A simplified version can be accessed here (Rockefeller University, New York, U.S.A.) . O ne cannot use one's own protein database.

PepFrag (Rockefeller University, New York, U.S.A.) - is a tool for searching protein or nucleotide sequences using information from fragmentation mass spectra of peptides.

  ProteinProspector (University of California) -  offers a wide variety of tools (e.g. MS-Fit, MS-Tag, MS-Seq, MS-Pattern, MS-Homology) for the protein mass spectroscopist.

Repeats:

Repeats in protein sequences can be discovered using Radar (Rapid Automatic Detection and Alignment of Repeats, European Bioinformatics Institute) or REPRO   (Reference: George RA. & Heringa J. 2000.  Trends Biochem. Sci. 25: 515-517).  Other useful sites are FAIR (Indian Institute of Science, Bangalore)  and Internal Repeat Finder (UCLA-DOE Institute for Genomics & Proteomics, U.S.A.).

REPPER (REPeats and their PERiodicities) - detects and analyzes regions with short gapless repeats in proteins. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. They are complemented by PSIPRED and coiled coil prediction (COILS), making the server a useful analytical tool for fibrous proteins. (Reference: M. Gruber et al. 2005. Nucl. Acids Res. 33: W239-W243).

Two-dimensional gels:

JVirGel calculation of virtual two-dimensional protein gels.   - creates virtual 2D proteomes from a huge list of eukaryotes & prokaryotes (or an individual protein). Two versions: html (limited) and Java applet (incredible but you need to install Java Runtime Environment. (Reference: K. Hiller et al. 2003. Nucl. Acids Res. 31: 3862-3865).

Draw Virtual Two-Dimensional Protein Gels (PRODORIC Net, Germany) - using your own protein sequence data or for different organisms. 

red_bullet.gif (914 bytes) I-Mutant2.0: predictor of protein stability changes upon mutation  - choose either a PDB reference number or paste your own protein. The answer (by email) indicates whether he protein is more or less stable, a fact which could be of use in designing "better" proteins.  (Reference: E. Capriotti et al. 2005. Nucl. Acids Res. 33: W306-W310).

Metasite:

 Scratch Protein Predictor - (Institute for Genomics and Bioinformatics, University California, Irvine) - programs include: ACCpro: the relative solvent accessibility of protein residues; CMAPpro: Prediction of amino acid contact maps; COBEpro: Prediction of continuous B-cell epitopes; CONpro: predicts whether the number of contacts of each residue in a protein is above or below the average for that residue; DIpro: Prediction of disulphide bridges; DISpro: Prediction of disordered regions; DOMpro: Prediction of domains; SSpro: Prediction of protein secondary structure; SVMcon: Prediction of amino acid contact maps using Support Vector Machines; and, 3Dpro: Prediction of protein tertiary structure (Ab Initio).