Molecular biology Java programs

Molecular Biology JAVA and Perl Programs

A. DNA sequence analyses
B. Genomic analyses
C. Primer design
D. Microarray analyses
E. Protein analyses
F. Alignments
G. Motifs
H. Phylogeny
I. Miscellaneous
J. Graphic packages

DNA sequence analysis:

FastQC (Simon Andrews, Bioinformatics Group, Babraham Institute, UK) - aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. The main functions of FastQC are to (a) Import of data from BAM, SAM or FastQ files (any variant), (b) Provide a quick overview to tell you in which areas there may be problems, and (c) present summary graphs and tables to quickly assess your data.

Sequence Manipulation Suite - is an incredible set of programs for manipulating DNA and protein sequences. (Reference: P. Stothard. (2000). Biotechniques 28: 1102-1104)

SeWeR (SEquence analysis using WEb Resources) is an integrated portal to common web-based services in bioinformatics. (Reference: M.K. Basu. (2001). Bioinformatics. 17: 577-578)

JEMBOSS (Java version of the European Molecular Biology Open Source Software Suite) (Open Bio Foundation, England). (Reference: T. Carver & A. Bleasby. 2003. Bioinformatics 19: 1837-1843). Can also be downloaded.

StarORF (Massachusetts Institute of Technology, U.S.A.) - the open reading frame finder.

prfectBLAST is a multiplatform (MS Windows, Mac OS X, Linux) graphical user interface (GUI) for the stand alone BLAST+ suite of applications. It allows researchers to do nucleotide or amino acid sequence similarity searches against public (or user-customized) databases locally stored. It does not require any dependencies or installation and can be used from a portable flash drive. (Reference: Santiago-Sotelo P, Ramirez-Prado JH. 2012. Biotechniques. 53(5):299-300).

SSTAR, a Stand-Alone Easy-To-Use Antimicrobial Resistance Gene Predictor - combines a locally executed BLASTN search against a customizable database with an intuitive graphical user interface for identifying antimicrobial resistance (AR) genes from genomic data. Although the database is initially populated from a public repository of acquired resistance determinants (i.e., ARG-ANNOT), it can be customized for particular pathogen groups and resistance mechanisms. (Reference: T. J. B. de Man & I.M. Limbago. mSphere 10.1128/mSphere.00050-15)

SynteView allows a fast and easy visualization of conservation of gene adjacency in many prokaryotic genomes for which orthology and neighbourhood data have been computed and stored in SynteBase, a dedicated relational database. (Reference: Lemoine, F. et al. BMC Bioinformatics, 2008, 9: 536).

Genome Viewers:

IGV (Integrative Genomics Viewer) - is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. It is available in multiple forms, including: the original IGV - a Java desktop application, and IGV-Web - a web application. (Reference: Thorvaldsdóttir H et al. (2013) Brief Bioinformatics 14: 178-192).

BugView - is a genome browser for comparing the arrangement of genes on a pair of related genomes, and can also be used to view individual genomes. (Reference: D.P. Leader. (2004) Bioinformatics 20: 129-130) .

CGView - this program uses files such as an NCBI ptt file to generate high quality, zoomable maps of circular genomes. CGView converts the input into a graphical map (PNG, JPG, or Scalable Vector Graphics format), complete with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any web browser, allowing rapid genome browsing, and facilitating data sharing. The feature labels in maps can be hyperlinked to external resources, allowing CGView maps to be integrated with existing web site content or databases. (Reference: P. Stothard, & D.S. Wishart. (2005). Bioinformatics 21: 537 - 539) .

Gepard (GEnome PAir - Rapid Dotter) allows the calculation of dotplots even for large sequences like chromosomes or bacterial genomes (Reference: J. Krumsiek et al. 2007. Bioinformatics 23: 1026-1028).

progressiveMauve - Multiple Genome Alignments - (Reference: A.E. Darling et al 2010. PloS one 5: e11147) - this program is designed for efficient multiple genomic alignment. It is ideally suited for closely related genomes where large scale events such as rearrangements and deletions have occurred. The Mac version works while the PC does not. It is also available as part of some Galaxy suites

J-Circos - Circos plots are graphical outputs that display three dimensional chromosomal interactions and fusion transcripts. However, the Circos plot tool isnot an interactive visualization tool, but rather a figure generator. This team has developed a Circos plot tool (J-Circos) that is an interactivevisualization tool that can plot Circos figures, as well as being able to dynamically add data to the figure, and providing information for specific datapoints using mouse hover display and zoom in/out functions. Users can input data into J-Circos using flat data formats, as well as from the GUI. (Reference: An J et al. 2015. Bioinformatics 31:1463-1465).

Apollo Genome Browser (collaborative project between the Berkeley Drosophila Genome Project and Ensembl) - Display of genomic sequence and any associated start and stop codons; annotations can be created and edited; zoomable and scrollable feature display down to sequence level optimized for display of large regions of genome; Searchable for feature names or sequence string.

GenoViz - is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats. (Reference: G.A. Helt et al. 2009. BMC Bioinformatics 10:266)

PCR primer design:

PerlPrimer - calculates primer melting temperature using J. SantaLucia's extensive nearest-neighbour thermodynamic parameters. To adjust for the salt conditions of the PCR, PerlPrimer uses the empirical formula derived by von Ahsen, et al. (2001) and allows the user to specify the concentration of Mg2+, dNTPs and primers, or use standard PCR conditions. The result is a highly accurate prediction of primer melting temperature, giving rise to a maximum yield of product when amplified. It calculates for possible primer-dimers and allows BLAST searches at NCBI or on a local server. In addition, results can be saved or optionally exported in a tab-delimited format that is compatible with most spreadsheet applications. (Reference: O.J. Marshall (2004) Bioinformatics 20: 2471-2472).

Picky is an oligo microarray design program that identifies probes that are very unique and specific to input sequences. These calculations are based on parameters inputted by the user including optimal probe length, ideal percentage of guanine and cytosine content, target-melting temperature, salt concentration and the maximum length to which a target sequence matches any non-target sequence. (Reference: H.-H. Chou et al. (2004) Bioinformatics 20: 2893-2902). Download genome *.ffn files from GenBank for use with this program. N.B. Unfortunately these files do not include the gene names only their coordinates.

Microarray analysis:

MAExplorer (MicroArray Explorer) - is a tool for data mining gene expression patterns. (Reference: P.F. Lemkin et al. (2000). Nucleic Acid Research 28: 4452-4459).

MAGIC Tool (MicroArray Genome Imaging & Clustering Tool) - A teaching resource developed at Davidson College (U.S.A.) by Laurie Heyer and her undergraduate students. (Reference: L. J. Heyer et al. 2005. Bioinformatics 21: 2114 - 2115).

VAMPIRE microarray suite is a collection of Java tools designed to perform Bayesian statistical analysis of gene expression array data. (Reference: Hsiao, A et al. 2005. Nucleic Acids Res. 33: W627-32).

Protein analysis:

FLICKER - is an open-source stand-alone computer program for visually comparing 2D gel images. (Reference: P.F. Lemkin and G.Thornwall. (2002). J. Walker (ed), The Protein Protocols Handbook, Second edition; Humana Press, Totowa, NJ)

TMRPres2D (TransMembrane protein Re-Presentation in 2 Dimensions tool) - takes data from a variety of protein folding servers and creates uniform, two-dimensional, high analysis graphical images/models of alpha-helical or beta-barrel transmembrane proteins. (Reference: I.C. Spyropoulos et al. (2004) Bioinformatics 20: 3258-3260).

MPEx (Membrane Protein Explorer) (Stephen White Laboratory, University of California Irvine, U.S.A.) - is a tool for exploring the topology and other features of membrane proteins by means of hydropathy plots using thermodynamic principles. MPEx can also be installed on your computer as a stand-alone or Web Start application.

BALLView - is a molecular viewer and modeling tool which combines state-of-the-art visualization capabilities with powerful modeling functionality including implementations of force field methods and continuum electrostatics models. (Reference: Moll et al. 2006. Bioinformatics 22: 365-366).

JMV (Java Molecular Viewer) - JMV is a molecular viewer written in Java and Java3D. JMV is designed to be an easy-to-use platform neutral molecular visualization tool, which can be used standalone or integrated into other programs. JMV provides several molecular representations, multiple coloring styles, lighting controls, and stereoscopic rendering capabilities. JMV loads PDB format molecular structure files over the web, from the RCSB protein databank, from BioCoRE filesystems, and from local filesystems.

Sequence Alignments:

PFAAT Protein Family Alignment Annotation Tool (Neogenesis Drug Discovery and Pfizer, Inc.) is a protein sequence alignment application designed to facilitate the analysis, curation, and annotation of large protein sequence families. Key features include the ability to align collections of sequences, cluster and/or group sequences into subfamilies, analyze sequences based on a number of similarity criteria, visualize protein structure, and annotate sequences and specific residue positions with text descriptions. (Reference: J.M. Johnson et al. (2003) Bioinformatics 19: 544-545).

Jalview - Analysis and Manipulation of Multiple Sequence Alignments (M. Clamp, J. Cuff, S. Searle & G. Barton. EMBL-EBI, United Kingdom)
JAligner (Ahmed Moustafa) - is an open source Java implementation of the dynamic programming algorithm Smith-Waterman for biological local pairwise sequence alignment.

BigFoot - extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. It implements an MCMC sampling approach to jointly estimate a DNA multiple sequence alignment and the locations of slowly evolving regions that may represent subsequences undergoing purifying selection. While the insertion-deletion model is fixed, this program is flexible to pair any implemented substitution model with the insertion-deletion model. (Reference: R. Satija et al. 2009. BMC Evolutionary Biol. 2009, 9:217)

Phylogeny:

ANI is widely used to classify and identify bacteria, OrthoANI was developed to overcome the large differences in reciprocal ANI values associated with the ANI algorithm. Furthermore, OrthoANIu tool employees USEARCH over BLAST for its OrthoANI calculations which increases the number of comparative studies and substantially decrease computational time. (Reference: Yoon, S. H. et al. (2017). Antonie van Leeuwenhoek. 110:1281–1286).

Orthologous Average Nucleotide Identity Tool (OAT) - OAT uses OrthoANI to measure the overall similarity between two genome sequences. ANI and OrthoANI are comparable algorithms: they share the same species demarcation cut-off at 95~96% and large comparison studies have demonstrated both algorithms to produce near identical reciprocal similarities. Details of the OrthoANI algorithm is given in (Lee et al. 2015). OAT employs an easy-to-follow Graphical User Interface that allow researchers to calculate OrthoANI values between genomes of interest without unfamiliar Command Line Environments. (Reference: Lee, I. et al. (2015). Int J Syst Evol Microbiol. 66: 1100-1103).

TreeIllustrator - is a program for displaying and manipulating phylogenetic trees. It gives you powerful means to customise your phylogenetic trees and compare them with the current classification of organisms. Handle trees with up to thousands of leafs; imports NEXUS(PAUP*) and NEWICK(PHYLIP) files; permits different Tree shapes: radial, radial logarithmic, phylogram, rectangular cladogram, radial cladogram and slanted cladogram; allows export to Bitmap (JPEG) an vector (PostScript) formats. (Reference: G. Trooskens et al. 2005. Bioinformatics 21: 3801-3802).

JSpecies is an easy to use, biologist-centric software designed to measure the probability if two genomes belonging to the same species or not. (Reference: Richter M, & Rosselló-Móra R. 2009. Proc Natl Acad Sci U S A. 106:19126-31).

PareTree 1.0.2 - This command-line Java program allows users to ‘pare’ down their tree by either removing unwanted leaves (tip-nodes), removing bootstrap information from the tree, or removing branch lengths from the tree – or any combination. Both of these functions can be accomplished in languages like R or Perl, but Java allows very large trees to be pared down quickly, efficiently, and easily!

SplitsTree - is the leading application for computing unrooted phylogenetic networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks. (Reference: Huson DH. Bioinformatics. 1998;14(1): 68-73).

Dendroscope 3 - is a program for working with rooted phylogenetic trees and networks. It provides a number of methods for drawing and comparing rooted phylogenetic networks, and for computing them from rooted trees. (Reference: Huson DH & Scornavacca C (2012) Syst. Biol. 61(6):1061–1067).

TreeGraph 2 (Reference: Stöver BC & Müller KF. BMC Bioinformatics 2010, 11:7).

Other:

JavaScript DNA Translator - A small Java program written by William L Perry III that permits one to layer amino acid sequence on DNA sequence.

RNAknot - is a new method for predicting RNA secondary structure that contains the following components: stems, hairpin loops, multi-branched loops or multi-loops, bulge loops, and internal loops, in addition to two types of pseudoknots, H-type pseudoknot and Hairpin kissing. RNAknot is based on a genetic algorithm and Greedy Randomized Adaptive Search Procedure (GRASP), and it uses the free energy as fitness function to evaluate the obtained structures. In order to validate the performance of the presented method 131 tests have been performed using two datasets of 26 and 105 RNA sequences, which have been taken from the two data bases RNAstrand and Pseudobase respectively. (Reference: El Fatmi A et al. (2019) J Bioinform Comput Biol. 17(5): 1950031).

Motifs:

Two Sample Logo - detects and displays statistically significant differences in position-specific symbol compositions between two sets of multiple sequence alignments. In a typical scenario, two groups of aligned sequences will share a common motif but will differ in their functional annotation. Also available as an online tool. (Reference: Vacic, V. et al. 2006. Bioinformatics 22: 1536-1537).

CiiiDER - is a user-friendly tool for predicting and analysing transcription factor binding sites, designed with biologists in mind. CiiiDER predicts transcription factor binding sites (TFBSs) across regulatory regions of interest, such as promoters and enhancers derived from any species. It can perform an enrichment analysis to identify TFs that are significantly over- or under-represented in comparison to a bespoke background set and thereby elucidate pathways regulating sets of genes of pathophysiological importance. (Reference: Gearing LJ et al. (2019) Plos ONE doi.org/10.1371/journal.pone.0215495).

Graphic packages:

ImageJ - is a public domain Java image processing program inspired by NIH Image. It It can display, edit, analyze, process, save and print 8-bit, 16-bit and 32-bit images. It can read many image formats including TIFF, GIF, JPEG, etc. It can calculate area and pixel value statistics of user-defined selections. It can measure distances and angles. It can create density histograms and line profile plots. It supports standard image processing functions such as contrast manipulation, sharpening, smoothing, edge detection and median filtering.

UPDATED: August, 2025