Molecular Biology JAVA and Perl Programs

A. DNA sequence analyses
B. Genomic analyses
C. Primer design
D. Microarray analyses
E. Protein analyses
F. Alignments
G. Motifs
H. Phylogeny
I.   Miscellaneous

J. Graphic packages

DNA sequence analysis:

Sequence Manipulation Suite  - is an incredible set of programs for manipulating DNA and protein sequences. (Reference: P. Stothard. (2000).  Biotechniques 28: 1102-1104)

SeWeR (SEquence analysis using WEb Resources)  is an integrated portal to common web-based services in bioinformatics. (Reference: M.K. Basu. (2001). Bioinformatics. 17: 577-578)

JEMBOSS (Java version of the European Molecular Biology Open Source Software Suite) (Open Bio Foundation, England). (Reference: T. Carver & A. Bleasby. 2003. Bioinformatics 19: 1837-1843). Can also be downloaded.

Initiation codon preference use  Inidon (Andre Villegas, Univ. Waterloo, Canada) - this Java-based program reads GenBank *.ffn files (FASTA formatted gene files) and provides one with a numeric and percentage usage of start codons.  The latter can be downloaded for sequenced genomes from the GenBank genome site. For bacteriophage and other smaller genomes locate the file using the "search genome" function at NCBI and select "Views - coding regions."  From the next screen use "Save - FASTA nucleotide."

blaSTOR (Microgen; University of Oklahoma Health Sciences Center; U.S.A.) - is a freely available database built in Microsoft Access and is used to BLAST, store and analyze nucleotide and protein sequences. It has an easy-to-use graphical interface that allows users to perform BLAST operations and organize their data quickly and effectively. It require ActivePerl.

  prfectBLAST is a multiplatform (MS Windows, Mac OS X, Linux) graphical user interface (GUI) for the stand alone BLAST+ suite of applications. It allows researchers to do nucleotide or amino acid sequence similarity searches against public (or user-customized) databases locally stored. It does not require any dependencies or installation and can be used from a portable flash drive. (Reference: Santiago-Sotelo P, Ramirez-Prado JH. 2012. Biotechniques. 53(5):299-300).

Genome Viewers:

BugView  - is a genome browser for comparing the arrangement of genes on a pair of related genomes, and can also be used to view individual genomes. (Reference: D.P. Leader. (2004) Bioinformatics 20: 129-130) .

CGView - this program uses files such as an NCBI ptt file to generate high quality, zoomable maps of circular genomes. CGView converts the input into a graphical map (PNG, JPG, or Scalable Vector Graphics format), complete with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any web browser, allowing rapid genome browsing, and facilitating data sharing. The feature labels in maps can be hyperlinked to external resources, allowing CGView maps to be integrated with existing web site content or databases. (Reference: P. Stothard, & D.S. Wishart. (2005). Bioinformatics 21: 537 - 539) .

 Mauve - Multiple Genome Alignments (School of Veterinary Medicine, Univ. Wisconsin-Madison, U.S.A.) -  (Reference: C.E. Aaron. (2004). Genome Research 14: 1394-1403) - this program is designed for efficient multiple genomic alignment.  It is ideally suited for closely related genomes where large scale events such as rearrangements and deletions have occurred.

ACT - Artemis Comparison Tool (The Welcome Trust Sanger Institute, United Kingdom) - allows an interactive visualization of comparisons between complete genome sequences and associated annotations. This brilliant tool is based on Artemis.  (Reference: T. Carver et al. 2005. Bioinformatics 21: 3422-3423)

Gepard (GEnome PAir - Rapid Dotter) allows the calculation of dotplots even for large sequences like chromosomes or bacterial genomes (Reference: J. Krumsiek et al. 2007. Bioinformatics 23: 1026-1028).

Sockeye (Michael Smith Genome Sciences Centre, Canada) - this application designed to assemble, view and work with genomic information in a 3D environment. This program links to the Ensembl database and displays genomic features along DNA sequences.

Apollo Genome Browser (collaborative project between the Berkeley Drosophila Genome Project and Ensembl) - Display of genomic sequence and any associated start and stop codons; annotations can be created and edited; zoomable and scrollable feature display down to sequence level optimized for display of large regions of genome; Searchable for feature names or sequence string.

Argo Genome Browser is the Broad Institute's (Cambridge, MA, U.S.A.) production tool for visualizing and manually annotating whole genomes. This application provides: Display of sequence and annotation tracks (from FASTA, Genbank, GFF, BLAST, and Genscan files); interactive zoom from megabase to nucleotide resolution; editing of individual features, supporting manual annotation and, intuitive and elegant comparative perspective (ComBo) for viewing dot plots of multiple aligned sequences.

 GenoViz - is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats. (Reference: G.A. Helt et al. 2009. BMC Bioinformatics 10:266)

 DNAPlotter - is an interactive Java application for generating circular and linear representations of genomes. Making use of the Artemis libraries to provide a user-friendly method of loading in sequence files (EMBL, GenBank, GFF) as well as data from relational databases, it filters features of interest to display on separate user-definable tracks. It can be used to produce publication quality images for papers or web pages.(Reference: Carver, T. et al. 2008. Bioinformatics 25:119-120)

PCR primer design:

PerlPrimer - calculates primer melting temperature using J. SantaLucia's extensive nearest-neighbour thermodynamic parameters. To adjust for the salt conditions of the PCR, PerlPrimer uses the empirical formula derived by von Ahsen, et al. (2001) and allows the user to specify the concentration of Mg2+, dNTPs and primers, or use standard PCR conditions. The result is a highly accurate prediction of primer melting temperature, giving rise to a maximum yield of product when amplified. It calculates for possible primer-dimers and allows BLAST searches at NCBI or on a local server.  In addition, results can be saved or optionally exported in a tab-delimited format that is compatible with most spreadsheet applications. (Reference: O.J. Marshall (2004) Bioinformatics 20: 2471-2472).

Picky is an oligo microarray design program that identifies probes that are very unique and specific to input sequences. These calculations are based on parameters inputted by the user including optimal probe length, ideal percentage of guanine and cytosine content, target-melting temperature, salt concentration and the maximum length to which a target sequence matches any non-target sequence. (Reference: H.-H. Chou et al. (2004) Bioinformatics 20: 2893-2902). Download genome *.ffn files from GenBank for use with this program. N.B. Unfortunately these files do not include the gene names only their coordinates.

Microarray analysis:

MAExplorer (MicroArray Explorer) - is a tool for data mining gene expression patterns. (Reference: P.F. Lemkin et al. (2000). Nucleic Acid Research 28: 4452-4459).

MicroArray Genome Imaging & Clustering Tool (MAGIC Tool) - A teaching resource developed at Davidson College (U.S.A.) by Laurie Heyer and her undergraduate students (Reference: L. J. Heyer et al. 2005. Bioinformatics 21: 2114 - 2115).

 QPCR - is a versatile web-based Java application that allows to store, manage, and analyze data from relative quantification qPCR experiments. It comprises a parser to import generated data from qPCR instruments and includes a variety of analysis methods to calculate cycle-threshold and amplification efficiency values. The analysis pipeline includes technical and biological replicate handling, incorporation of sample or gene specific efficiency, normalization using single or multiple reference genes, inter-run calibration, and fold change calculation. (Reference: S. Pabinger et al. 2009. BMC Bioinformatics 10:268)

Protein analysis:

Friend - allows one to interactively manipulate, visualize and analyze hundreds of proteins, their spatial structures, amino-acid and nucleotide sequences and alignments, active and binding sites, fragments, and domains in protein and gene families, as well as to display large macromolecular complexes such as ribosomes or viruses (Reference: A. Abyzov et al. 2005. Bioinformatics 21: 3677-3678).

GelScape -  allows analysis of standard 1D and 2D protein gels.  It  uses advanced concepts in "network computing" enabling one to upload, download, save, print, view, annotate, edit, label, compare and spot mark just about any gel image.  GelScape also allows one to calculate spot intensity, prepare HTML image maps and archive annotated gels to a public database (GelBank) - all using an easy-to-use, intuitive browser interface.  (Reference: N. Young et al. (2004). Bioinformatics 20: 976 - 978)

FLICKER - is an open-source stand-alone computer program for visually comparing 2D gel images. (Reference: P.F. Lemkin and G.Thornwall. (2002). J. Walker (ed), The Protein Protocols Handbook, Second edition; Humana Press, Totowa, NJ)

TMRPres2D (TransMembrane protein Re-Presentation in 2 Dimensions tool) - takes data from a variety of protein folding servers and creates uniform, two-dimensional, high analysis graphical images/models of alpha-helical or beta-barrel transmembrane proteins. (Reference: I.C. Spyropoulos et al. (2004) Bioinformatics 20: 3258-3260).

MPEx (Membrane Protein Explorer) (Stephen White Laboratory, University of California Irvine, U.S.A.) - is a tool for exploring the topology and other features of membrane proteins by means of  hydropathy plots using thermodynamic principles. MPEx can also be installed on your computer as a stand-alone or Web Start application. 

TopDraw - is a sketchpad for drawing topology cartoons of proteins. For  Windows: Install Tcl/Tk v8 or higher.   Download TopDraw and save it as TopDraw.tcl (Reference: C.S. Bond. 2003. Bioinformatics 19: 311-312).

BALLView - is a molecular viewer and modeling tool which combines state-of-the-art visualization capabilities with powerful modeling functionality including implementations of force field methods and continuum electrostatics models. (Reference: Moll et al. 2006. Bioinformatics 22: 365-366).

JMV (Java Molecular Viewer) - JMV is a molecular viewer written in Java and Java3D. JMV is designed to be an easy-to-use platform neutral molecular visualization tool, which can be used standalone or integrated into other programs. JMV provides several molecular representations, multiple coloring styles, lighting controls, and stereoscopic rendering capabilities. JMV loads PDB format molecular structure files over the web, from the RCSB protein databank, from BioCoRE filesystems, and from local filesystems.

JMVS4 - Java3D Molecular Visualisation System is an open-source freeware molecular visualisation tool developed in Java using a combination of Java 2, Java SWING and the Java3D API. It loads Protein Data Bank (PDB) files and converts the data into a 3D representation of the molecule allowing the user to view the molecule in a variety of display modes such as Balls, Sticks and Ribbons and colour schemes such as CPK Amino, Group and Structure. The user can also interact with the model by rotating, scaling and translating the molecule as well as picking atoms or bonds to view data.

QuteMol is an open source (GPL), interactive, high quality molecular visualization system which exploits the current GPU capabilites through OpenGL shaders to offers an array of innovative visual effects (Ball and Sticks, Space-Fill and Liquorice visualization modes & Depth Aware Silhouette Enhancement). It's visualization techniques are aimed at improving clarity and an easier understanding of the 3D shape and structure of large molecules or complex proteins. (Reference: M. Tarini et al. 2006. IEEE Transaction on Visualization & Computer Graphics 12[5]).

Sequence Alignments:

QAlign - combines algorithms for fast progressive (Clustal W) and accurate simultaneous multiple alignment with a versatile editor and a dynamic phylogenetic analysis to provide a convenient graphical user interface as a standalone package for multiple platforms. In addition, QAlign includes iterative and consistency-based state-of-the-art extensions to these traditional strategies, which recently have shown much promise (T-coffee and DIALIGN). (Reference:  S.M. Rothgänger et al. (2003). Bioinformatics 19: 1592-1593)
Panta rhei (QAlign2) - extends QAlign by several features. Major redesigns on the user interface, for instance, allow users to flexibily compose views for multiple projects. The new sequence viewer handles datasets with arbitrarily many and arbitrarily large sequences that may still be edited by guided block moving. More distance-based algorithms are available to interactively reconstruct phylogenetic trees which can now also be zoomed and navigated graphically. (Reference: Sammeth, M. et al. 2006. Bioinformatics 22: 889-890)


STRAP Structural Alignment Program for Proteins editor for multiple sequence alignments (Protein Structure Theory Group, Institute of Biochemistry, Charité Humboldt University, Germany). An interactive extendable and scriptable editor for large protein alignments which integrates amino acid sequence, secondary structure, 3D-structure and genomic- and mRNA-sequence. (Reference: C. Gille et al. (2003)  Bioinformatics 12:  2489- 2491). 

PFAAT Protein Family Alignment Annotation Tool (Neogenesis Drug Discovery and Pfizer, Inc. ) is a protein sequence alignment application designed to facilitate the analysis, curation, and annotation of large protein sequence families. Key features include the ability to align collections of sequences, cluster and/or group sequences into subfamilies, analyze sequences based on a number of similarity criteria, visualize protein structure, and annotate sequences and specific residue positions with text descriptions. (Reference: J.M. Johnson et al. (2003) Bioinformatics 19: 544-545).  

Jalview - Analysis and Manipulation of Multiple Sequence Alignments (M. Clamp, J. Cuff, S. Searle & G.  Barton. EMBL-EBI, United Kingdom) 
JAligner (Ahmed Moustafa) - is an open source Java implementation of the dynamic programming algorithm Smith-Waterman for biological local pairwise sequence alignment.

 ISHAN - is a flexible platform for performing fast homology analysis and molecular phylogenetic studies on proteins and DNA sequences, by bringing together all the relevant tools under a single package. Since the framework facilitates speedy alignments and compilation of data, evolutionary tracing of proteins and genes can be carried out in a faster way using ISHAN. (Reference: P. Shil et al. 2006. In Silico Biol. 6: 0035)

 BigFoot - extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. It implements an MCMC sampling approach to jointly estimate a DNA multiple sequence alignment and the locations of slowly evolving regions that may represent subsequences undergoing purifying selection. While the insertion-deletion model is fixed, this program is flexible to pair any implemented substitution model with the insertion-deletion model. (Reference: R. Satija et al. 2009. BMC Evolutionary Biol. 2009, 9:217)

 SynteBase/SynteView - allows a fast and easy visualization of conservation of gene adjacency in many prokaryotic genomes for which orthology and neighbourhood data have been computed and stored in SynteBase, a dedicated relational database. (Reference: Lemoine, F. et al. 2008. BMC Bioinformatics, 9: 536)

Phylogeny:

Jevtrace - is a implementation of the evolutionary trace method. The software expands on the evolutionary trace by allowing manipulation of the input data and parameters of analysis, and presents a number of novel tree inspired analysis of protein families. Jevtrace includes a multivalent graphical browser for multiple sequence alignment, phylogeny, and structure, as well as underlying object and algorithmic infrastructure. Structure visualizaton is designated to WebMol, allowing live mapping of results onto protein structures. The combination of Jevtrace+WebMol can be used generically as a viewer for combinations of molecular phylogeny, sequence alignment, and structure data. (Reference: M.P. Joachimiak & F.E. Cohen. 2002. Genome Biology 3: research0077.1-0077.12).

jPHYDIT - Phylogenetic Editor for JAVA - a molecular sequence editor specially designed for phylogenetic analysis such as ribosomal RNA sequences. It displays secondary/tertiary structure pairings of ribosomal RNA molecules allowing users edit nucleotide sequences. This process allows users to do "alignment based on rRNA secondary structure" which is required for the precise phylogenetic inference. (Reference: J-S. Jeon et al. 2005. Bioinformatics 21: 3171-3173).

ATV (A Tree Viewer) is a Java tool for the visualization of annotated phylogenetic trees. ATV reads standard "New Hampshire" format tree files (as produced by all major phylogenetic analysis software)

 SNAP Workbench - this program manages and coordinates a series of analysis programs for making inferences on population processes. It allows the user to customize the implementation of complex console programs and functions for the purpose of automating and enhancing data exploration. The workbench facilitates population parameter estimation by ensuring that the assumptions and program limitations of each analysis method are met and by providing a step-by-step methodology to effectively integrate both summary-statistic methods and coalescent-based population genetic models. (Reference: E.W. Price & I. Carbone (2005) Bioinformatics 21: 402-404).

MatGAT generates similarity/identity matrices using protein or DNA sequences.  (Reference: J.J. Campanella et al. (2003). BMC Bioinformatics. 4: 29).

Phylogenetic Tree Reconciler - permits finding gene duplications in phylogenetic trees, in order to improve gene function inferences. The algorithm is applicable to realistic data, especially n-ary species tree and unrooted phylogenetic tree. The algorithm also takes branch lengths into account.  (Reference: J-F. Dufayard et al. 2005. Bioinformatics 21: 2596 - 2603).

Pintail - is a tool for identifying anomalies such as chimeras within 16S rDNA sequences.  In essence, the program works by comparing evolutionary distances between a query and subject sequence over the length of the 16S rRNA gene (small subunit rRNA), by employing a sampling window of specified size, progressing a fixed number of bases at a time along the length of the gene. (Reference: K.E. Ashelford et al. 2005. Appl. Environ. Microbiol. 71: 7724-7736).

TreeIllustrator - is a program for displaying and manipulating phylogenetic trees. It gives you powerful means to customise your phylogenetic trees and compare them with the current classification of organisms. Handle trees with up to thousands of leafs; imports NEXUS(PAUP*) and NEWICK(PHYLIP) files; permits different Tree shapes: radial, radial logarithmic, phylogram, rectangular cladogram, radial cladogram and slanted cladogram; allows export to Bitmap (JPEG) an vector (PostScript) formats. (Reference: G. Trooskens et al. 2005. Bioinformatics 21: 3801-3802).

 CTree - CTree has been designed for the quantification of clusters within viral phylogenetic tree topologies. Clusters are stored as individual data structures from which statistical data, such as the Subtype Diversity Ratio (SDR), Subtype Diversity Variance (SDV) and pairwise distances can be extracted. (Reference: Archer J. & Robertson DL.2007. Bioinformatics. 23: 2952-2953)

Other:

JavaScript DNA Translator - A small Java program written by William L Perry III that permits one to layer amino acid sequence on DNA sequence. Available from BioTechniques Software Library.

  WebPlasmid (Rashmi and Varun Singh) - is a Java-based program which allows mapping of restriction sites, label dragging, and clockwise and anticlockwise arrows.

red_bullet.gif (914 bytes) Motifs:

Two Sample Logo - detects and displays statistically significant differences in position-specific symbol compositions between two sets of multiple sequence alignments. In a typical scenario, two groups of aligned sequences will share a common motif but will differ in their functional annotation. Also available as an online tool. (Reference: Vacic, V. et al. 2006. Bioinformatics 22: 1536-1537)

Graphic packages:

ImageJ - is a public domain Java image processing program inspired by NIH Image. It It can display, edit, analyze, process, save and print 8-bit, 16-bit and 32-bit images. It can read many image formats including TIFF, GIF, JPEG, etc. It can calculate area and pixel value statistics of user-defined selections. It can measure distances and angles. It can create density histograms and line profile plots. It supports standard image processing functions such as contrast manipulation, sharpening, smoothing, edge detection and median filtering.