N.B.Many of the tools that one needs for the analysis of
genomes can be found in the DNA Sequence Analysis section. Here we have unique
tools for genomic
analysis which do not fit easily in that section.
DNA Sequence Quality - Phred - provides base calling, chromatogram display and high quality sequence region evaluation and presentation for up to five sequences simultaneously.
Sequence assembly - you don't need your own contig assembly program when you can use:
EGassember - aligns and merges sequence fragments resulting from shotgun sequencing or gene transcripts (EST) fragments in order to reconstruct the original segment or gene (Reference: A. Masoudi-Nejad et al. 2006. Nucl. Acids Res. 34: W459-462).
CAP3 (PBIL, France ), (Reference: Huang,X. & Madan A. 1999. Genome Res. 9: 868-877).
CAP EST Assembler(Istituto FIRC di Oncologia Molecolare, Italy) - Maximum sequence length for each sequence is 30 kb - Maximum number of sequences 10 kb
Divide-and-Conquer Multiple Sequence Alignment (Universitat Bielefeld, Germany)
Sequencing errors - if your DNA sequence doesn't match the expected protein sequence you can check for errors at GenWise (EMBL-EBI) which compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors. Other programs include:
GENIO/frame (Genio/scan N. Mache)
FrameD (Toulose Genopole, France; T. Schliex et al. 2003. Nucl. Acids Res. 31: 3738-3741)
Shift - a web server for hidden stops in frameshiften translation
path :: protein back-translation and alignment - addresses the problem of finding distant protein homologies where the divergence is the result of frameshift mutations and substitutions. Given two input protein sequences, the method implicitly aligns all the possible pairs of DNA sequences that encode them, by manipulating memory-efficient graph representations of the complete set of putative DNA sequences for each protein. (Reference: Gîrdea M et al. 2010. Algorithms for Molecular Biology 5:)
In-silico.com (Dr. Joseba Bikandi & co-workers, Faculty of Pharmacy, in the University of the Basque Country) - allows in silico experiments including theoretical PCR amplification, AFLP-PCR , restriction analysis and pulsed field gel electrophoresis [PFGE] with bacterial & archael genomes found in the public database.
Specialized annotation - general
ResFinder 2.0 (Acquired antimicrobial resistance gene finder) - uses BLAST for identification of acquired antimicrobial resistance genes in whole-genome data. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. Tested with 1411 different resistance genes with 100% identity. (Reference: Zankari E et al. 2012. J Antimicrob Chemother. 67:2640-2644)
PlasmidFinder 1.1 - identifies plasmids in total or partial sequenced isolates of bacteria. The method uses BLAST for identification of replicons of plasmids belonging to the major incompatibility (Inc) groups of Enterobacteriaceae. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms.
PathogenFinder (predicts pathogenic potential) – Based on complete genomes from 513 bacteria annotated as human non-pathogens and 372 bacteria annotated as human pathogens, a database of protein families, which are either mainly associated with non-pathogens or with pathogens have been created. This database is then used for predicting the pathogenic potential of bacteria. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. (Reference: Cosentino S et al. 2013. PLoS ONE 8: e77302)
SpeciesFinder 1.0 (Danish Technical University) - predicts the species of bacteria from pre-assembled, complete or partial genomes, and short sequence reads. The prediction is based on the 16S rRNA gene.
snpTree 1.1 (SNPs phylogenetic trees) – online and automatic SNPs analysis and construction of phylogenetic trees. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from three different sequencing platforms. (Reference: Leekitcharoenphon P et al. 2012. BMC Genomics 13 Suppl 7:S6)
KmerFinder 1.1 – predicts the species of bacteria from pre-assembled, complete or partial genomes, and short sequence reads. The prediction is based on the number of co-occurring k-mers (substrings of k nucleotides in DNA sequence data, in this case 16-mers) between the genomes of reference bacteria in a database and the genome provided by the user. (Reference: Hasman H et al. 2013. J Clin Microbiol. 52:139-146)
VIOLIN: Vaccine Investigation and Online Information Network - allows easy curation, comparison and analysis of vaccine-related research data across various human pathogens VIOLIN is expected to become a centralized source of vaccine information and to provide investigators in basic and clinical sciences with curated data and bioinformatics tools for vaccine research and development. VBLAST: Customized BLAST Search for Vaccine Research allows various search strategies against against 77 genomes of 34 pathogens. (Reference: He, Y. et al. 2014. Nucleic Acids Res. 42 (Database issue):D1124-32).
MLST 1.7 (MultiLocus Sequence Typing) - currently only works with assembled genomes and contigs (Reference: Larsen MV et al. 2012. J. Clin. Micobiol. 50: 1355-1361).
FSFinder2 (Frameshift Signal Finder) - Programmed ribosomal frameshifting is involved in the expression of certain genes from a wide range of organisms such as virus, bacteria and eukaryotes including human. In programmed frameshifting, the ribosome switches to an alternative frame at a specific site in response to a special signal in a messanger RNA. Programmed frameshift plays role in viral particle morphogenesis, autogenous control, and alternative enzymatic activities. The common frameshift is a -1 frameshift, in which the ribosome shifts a single nucleotide in the upstream direction. The major elements of -1 frameshifting consist of a slippery site, where the ribosome changes reading frames, and a stimulatory RNA structure such as pseudoknot or stem-loop located a few nucleotides downstream. +1 frameshifts are much less common than -1 frameshifting but are observed in diverse organisms.
Specialized annotation - CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)
CRISPRfinder - enables the easy detection of CRISPRs in locally-produced data and consultation of CRISPRs present in the database. It also gives information on the presence of CRISPR-associated (cas) genes when they have been annotated as such. . (Reference: I. Grissa et al. 2007. Nucl. Acids Res. 35 (Web Server issue): W52-W57).
CRISPRmap -provides a quick and detailed insight into repeat conservation and diversity of both bacterial and archaeal systems. It comprises the largest dataset of CRISPRs to date and enables comprehensive independent clustering analyses to determine conserved sequence families, potential structure motifs for endoribonucleases, and evolutionary relationships. (Reference: S.J. Lange et al. 2013. Nucleic Acids Research, 41: 8034-8044) .
CRISPI : a CRISPR Interactive database - includes a complete repertory of associated CRISPR-associated genes (CAS). A user-friendly web interface with many graphical tools and functions allows users to extract results, find CRISPR in personal sequences or calculate sequence similarity with spacers.(Reference: Rousseau C et al. 2009. Bioinformatics. 25: 3317–3318).
CRISPRTarget - that predicts the most likely targets of CRISPR RNAs. This can be used to discover targets in newly sequenced genomic or metagenomic data. (Reference: Biswas A et al. 2013. RNA Biol. 10:817-827)
Specialized annotation - virulence determinants. This is of particular interest to those working on bacteriophages for therapy
VirulenceFinder (Danish Technical University) – identification of virulence genes. The method uses BLAST for identification of known virulence genes in Escherichia coli. The method is being extended to also include virulence genes for Enterococcus and Staphylococcus aureus. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms.
ClanTox: a classifier of short animal toxins - predicts whether each sequence is toxin-like and provides a ranked list of positively predicted candidates according to statistical confidence. For each protein, additional information is presented including the presence of a signal peptide, the number of cysteine residues and the associated functional annotations. (Reference: G. Naamati et al. 2009. Nucleic Acids Res. 37(Web Server issue): W363–W368).
BTXpred - aim is to predict bacterial toxins and its function from primary amino acid sequence using SVM, HMM and PSI-Blast. It allows users to (a) predict bacterial toxins with 96.07% accuracy; (b) classify bacterial toxins into exotoxins and endotoxins with an accuracy of 95.71%; and (c) classify exotoxins into seven different functions depending upon their molecular targets i) activate adenylate cyclase, ii) activate guanylate cyclase, iii) food poisioning iv) neurotoxins, v) macrophage cytotoxin, vi) vacuolating cytotoxin and vii) thiol activated cytotoxin with 100% overall accuracy. (Reference: Saha S & Raghava GP. 2007. In Silico Biol. 7: 405-412).
t3db the Toxin and Toxin Target Database - combines detailed toxin data with comprehensive toxin target information. The database currently houses 3,053 toxins which are linked to 1,670 corresponding toxin target records. Each toxin record (ToxCard) contains over 50 data fields and holds information such as chemical properties and descriptors, toxicity values, molecular and cellular interactions, and medical information. (Reference: Lim E et al. 2010. Nucleic Acids Res. 38(Database issue): D781-786).
DBETH Database of Bacterial ExoToxins for Humans is a database of sequences, structures, interaction networks and analytical results for 229 exotoxins, from 26 different human pathogenic bacterial genus. All toxins are classified into 24 different Toxin classes. The aim of DBETH is to provide a comprehensive database for human pathogenic bacterial exotoxins. (Reference: Chakraborty A et al. 2012. Nucleic Acids Res. 40(Database issue): D615-620).
VFDB - is an integrated and comprehensive database of virulence factors for bacterial pathogens (also including Chlamydia and Mycoplasma). (Reference: L.H. Chen et al. 2012. Nucleic Acids Res. 40(Database issue): D641-D645).
MvirDB - LLNL Virulence Database - allows BLAST searching against a database of sequences representing known toxins, virulence factors, and antibiotic resistance genes
(Reference: C. E. Zhou et al. 2007. Nucl. Acids Res. 35 (suppl 1): D391-D394).
Pathogenicity island database (PAI DB) has made an effort to collect known PAIs and to detect the potentialPAI regions in the prokaryotic complete genomes. Pathogenicity islands (PAIs) are distinct genetic elementsof pathogens encoding various virulence factors. Site allows text and BLAST searches. (Reference: Yoon SH etal. 2007. Nucleic Acids Res. 35: D395-D400).
PanDaTox (Pan Genomic Database for Genomic Elements Toxic to Bacteria) - is a database of genes and intergenic regions that are unclonable in E. coli, to aid n the discovery of new antibiotics and biotechnologically beneficial functional genes. It is also designed to improve the efficiency of metabolic engineering. BLAST Search feature included. (Reference: Mitai G & Sorek R. 2012. Bioengineered, 3: 218-221.)
VirulentPred - is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. (Reference: Garg A & Gupta G. 2008. BMC Bioinformatics 9: 62).
Genome comparisons and synteny:
SyntTax - is a web server linking synteny to prokaryotic taxonomy. SyntTax incorporates a full hierarchical taxonomic tree allowing intuitive access to all completely sequenced prokaryotes (Archaea and Bacteria). Single or multiple organisms can be chosen on the basis of their lineage by selecting the corresponding rank nodes in the tree. This is my favourite among the synteny programs (Reference: Oberto J. 2013. BMC Bioinformatics. 14:4). The results below were generated using the heat-shock sigma factor (RpoH) from Salmonella Typhimurium against the Pseudomonadales.
Cinteny Server for Synteny Identification and Analysis of Genome Rearrangement (A. U. Sinha & J. Meller, University of Cincinnati, USA) - this server can be used for finding regions syntenic across multiple genomes and measuring the extent of genome rearrangement using reversal distance as a measure. You may create a project and upload your own data or work with pre-loaded prokaryote or eukaryote data.
AutoGRAPH is an integrated web server for multi-species comparative genomic analysis. It is designed for constructing and visualizing synteny maps between two or three species, determination and display of macrosynteny and microsynteny relationships among species, and for highlighting evolutionary breakpoints.
The web server constructs synteny maps by pairwise comparison of marker/anchor orders between a reference chromosome and one or two tested genome(s). It permits users to visualize and characterize several features: Conserved segments (CS), Conserved Segments Ordered (CSO) and breakpoints. (Reference: Derrien T et al. 2007. Bioinformatics 23:498-499).
Sibelia (University of California San Diego, USA) - is a tool for finding synteny blocks in multiple closely related microbial genomes using iterative de Bruijn graphs. Unlike most other tools, Sibelia can find synteny blocks that are repeated within genomes as well as blocks shared by multiple genomes. It represents synteny blocks in a hierarchy structure with multiple layers, each of which representing a different granularity level.
GeneOrder 2.0 (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is ideal for comparing small GenBank genomes (up to 0.25 Mb), while GeneOrder 3.0 extends the limits to approx. 2.0Mb. Each gene from the Query sequence is compared to all of the genes from the Reference database using BLASTP. There are two display formats: graphical and tabular. Currently the graph is an applet and must be saved as a "SCREEN SHOT".
CoreGenes (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is designed to analyze two to five genomes simultaneously, generating a table of related genes - orthologs and putative orthologs. These entries are linked to their GenBank data. It has a limit of 0.35 Mb, while the newer version CoreGenes 2.0 extends the limit to approx. 2.0Mb. If your data is not present in GenBank use this site.
CoreGenes 3 (D. Seto & P. Mahadevan, Bioinformatics & Computational Biology, George Mason Univ., U.S.A) - tallies the total number of genes in common between the two genomes being compared; displays the percent value of genes in common with a specific genome; determines the unique genes contained in a pair of proteomes. CoreGenes 3.5 is the batch CoreGenes server. I have tried this out on a group of P22likevirus species with great success.
WebACT - this is the web version of ACT (Artemis Comparison Tool) a DNA sequence comparison viewer based on Artemis (Reference:
T.J. Carver et al. Bioinformatics21: 3422 - 3423 ).Visit the database page of EMBL-EBI and select EMBL and "Standard Query Form" to determine the EMBL accession number for the sequence you are interested in.
WebGMAP - is a public web service for annotating and mapping individual cDNA sequences to the genomes of many eukaryote species, currently including Arabidopsis thaliana, Chlamydomonas reinhardtii, Glycine max, Oryza sativa, Physcomitrella patens and Populus trichocarpa. (Reference: C. Liang et al. 2009. Nucl. Acids Res. 37(Web Server issue):W77-W83)
Panseq (Chad Laing, Public Health Agency of Canada) - a group of tools for the analysis of the 'pan genome' of a group of genomic sequences. The pan-genome of a bacterial species consists of a core genome and an accessory gene pool, the latter of which allows subpopulations of the organism to adapt to specific environments. These include Novel Region Finder, which will find sequences that are unique to a strain or group of strains with respect to another strain or group of strains. Pan-genome Analysis identifies the pan-genome among your sequences; and, finds SNPs in the core genome and determine the distribution of accessory genomic regions.Loci Selector identifies loci that offer the best discrimination among your dataset. (Reference: Laing, C. et al. 2010. BMC Bioinformatics. 11: 461).
PARIGA - enables users to perform all-against-all BLAST searches on two sets of sequences selected by the user. Moreover, since it stores the two BLAST output in a python-serialized-objects database, results can be filtered according to several parameters in real-time fashion, without re-running the process and avoiding additional programming efforts. (Reference: Orsini M. et al. 2013. PLoS One 8(5):e62224).
EDGAR "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" - EDGAR is designed to automatically perform genome comparisons in a high throughput approach and can be used for core genome, pan genome and singleton analysis, and Venn diagram construction. (Reference: Blom J. et al. 2009. BMC Bioinformatics 10: 154).
Specialized annotation - AAI and ANI
ANI (Average Nucleotide Identity) calculator - estimates the average nucleotide identity using both best hits (one-way ANI) and reciprocal best hits (two-way ANI) between two genomic datasets. Typically, the ANI values between genomes of the same species are above 95% (e.g., Escherichia coli). Values below 75% are not to be trusted, and AAI should be used instead. This tool supports both complete and draft genomes (multi-fasta). (Reference: Goris J et al. 2007. Int J Syst Evol Microbiol. 57(Pt 1): 81-91).
GGDC (Genome-To-Genome Distance Calculator) - provides methods for inferring whole-genome distances which are well able to mimic DNA-DNA hybridization (DDH). Values calculated with GGDC yield a somewhat better correlation with wet-lab DDH values than alternative approaches such as "ANI". These distance functions can also cope with heavily reduced genomes and repetitive sequence regions. Some of them are also very robust against missing fractions of genomic information (due to incomplete genome sequencing). Thus, this web service can be used for genome-based species delineation. (Reference: Meier-Kolthoff JP et al. 2013. BMC Bioinformatics 14: 60).
POGO-DB - Based on computationally intensive whole-genome BLASTs, POGO-DB provides several metrics on pairwise genome: (a) Average Amino Acid Identity of all bi-directional best blast hits that covered at least 70% of the sequence and had 30% sequence identity; (b) Genomic Fluidity that estimates the similarity in gene content between two genomes; (c) Number of orthologs shared between two genomes (as defined by two criteria); (d) Pairwise identity of the most similar 16S rRNA genes; (e) Pairwise identity of 73 additional globally-conserved marker genes (which were determined by us to exist in at least 90% of all the genomes). (Reference: Lan Y et al. 2014. Nucl. Acids Res. 42 (D1): D625-D632).
NCBI Prokaryotic Genomes Automatic Annotation Pipeline. This will completely annotate your bacterial genome and provide you with a Sequin submission file. N.B. an NCBI Phage Automatic Annotation Pipeline is in developement.
IGS Prokaryotic Annotation Pipeline (Institute for Genome Sciences, University of Maryland, U.S.A.) IGS has developed a comprehensive automated pipeline for use with Bacteria and Archaea. The pipeline predicts protein-coding genes as well as non-coding RNAs. Similarity evidence is collected for predicted proteins with a variety of methods including pairwise alignments, HMM searches, and multiple motif prediction tools. A hierarchical rule-based system is used to assign annotation to each protein based on the highest quality available evidence. Results are loaded into a relational database and can be viewed using the Manatee annotation visualization and curation tool. Results are also available in multiple standard flat file formats.In the case of IGS and NCBI one has to contact them by email requesting their services.
RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. Requires registration. (Reference: Aziz, RK et al. 2008. BMC Genomics 9:75.). See also MyRAST under Molecular Biology Freeware for Windows.
BASys Bacterial Annotation Tool - this incredible tool supports automated, in-depth annotation of bacterial genomic sequences. It accepts raw DNA sequence data and an optional list of gene identification information (Glimmer) and provides extensive textual annotation and hyperlinked image output. BASys uses >30 programs to determine 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3D structure, reactions and pathways. (Reference: G.H. Van Domselaar et al. 2005. Nucl. Acids Res. 33(Web Server issue):W455-W459).
MAKER Web Annotation Service (MWAS) is an easily configurable web-accesible genome annotation pipeline. It's purpose is to allow research groups with small to intermediate amounts of eukaryotic and prokaryotic genome sequence (i.e. BAC clones, small whole genomes, preliminary sequencing data, etc.) to independently annotate and analyse their data and produce output that can be loaded into a genome database. (Reference: Holt, C. & Yandell, M. 2011. BMC Bioinformatics 12:491).
xBASE bacterial genome annotation service - To use, submit a file containing one or more FASTA-formatted nucleotide sequences (contigs produced by a whole genome assembler, for example). To guide gene prediction and annotation, select the closest complete reference sequence for your genome. (Reference:Chaudhuri RR, et al. 2008. Nucleic Acids Res.36(Database issue): D543-546.)
GenSAS - Genome Sequence Annotation Server provides a one-stop website with a single graphical interface for running multiple structural and functional annotation tools, enabling visualization and manual curation of genome sequences. Users can upload sequences into their account and run gene prediction programs, protein homology searches, map ESTs, identify repeats, ORFs and SSRs with custom parameter settings. Each analysis is displayed on separate tracks of the graphical interface with custom editabe tracks to select final annotation of features and create gff3 files for upload to genome browsers such as GBrowse. Additional programs can be easily added using this Drupal based software.
Viral Genome ORF Reader (VIGOR) - supports high throughput feature prediction and annotation. VIGOR employs an extrinsic strategy and boasts sensitivity and specificity greater than 98% for the RNA viral genomes we tested. Genome-specific features identified by VIGOR include frameshifts, ribosomal slippage, RNA editing, stop codon read-through, overlapping genes, embedded genes, and mature peptide cleavage sites. Genotyping capability for influenza and rotavirus is built into the program.
(Reference: S. Wang et al. 2011. BMC Bioinformatics 2010, 11:451)
FLAN (FLu ANnotation) is an NCBI web server for genome annotation of influenza virus is a tool for user-provided influenza A virus or influenza B virus sequences. It can validate and predict protein sequences encoded by an input flu sequence.(Reference: Y. Bao et al. 2007. Nucleic Acids Res. Web Server issue) 35: W280-W284.)
CpGAVAS - allows accurate chloroplast genome annotation, the generation of circular maps, the provision of useful analysis results of the annotated genome, the creation of files that can be submitted to GenBank directly. (Reference: C. Liu et al. 2012. BMC Genomics 13: 715)
Genome Annotation Transfer Utility (GATU) annotates a genome based on a very closely related reference genome. The proteins/mature peptides of the reference genome are BLASTed against the genome to be annotated in order to find the genes/mature peptides in the genome to be annotated (Reference: T. Tcherepanov et al. 2006. BMC Genomics 7:150.)
ORF (Groningen Biomolecular Sciences and Biotechnology Institute, Haren, the Netherlands) - offers one of the choice of Glimmer, ZCurve or GeneMark predictions coupled with GenBank or Fasta-formatted output. Works very well and quickly with phage-sized genomes.
BioGPS (The Scripps Research Institute, USA) - is a one-stop gene annotation portal that emphasizes user-customizability and community-extensibility It is a customizable gene annotation portal and a complete resource for learning about gene and protein function.
BAGEL (Groningen Biomolecular Sciences and Biotechnology Institute, Haren, the Netherlands) - will determine from an existing or non submitted GenBank file the presence of bacteriocins based on a database containing information of known bacteriocins and adjacent genes involved in bacteriocin activity.
MICheck (MIcrobial genome Checker) - enables rapid verification of sets of annotated genes and frameshifts in previously published bacterial genomes, or genomes for which the user has a *.gbk file. This tool can be seen as a preliminary step before the functional re-annotation step to check quickly for missing or wrongly annotated genes. It worked nicely with phage genomes from 43-135kb. (Reference: S. Cruveiller et al. 2005. Nucl. Acids Res. 33: W471- W479).
WebGeSTer - Genome Scanner for Terminators - my favourite terminator search program is finally web enabled. Please note that if you want to analyze data from a *.gbk file you need to use their conversion program "GenBank2GeSTer" first. A complete description of each terminator including a diagram is produced by this program. This site linked to an extensive database of transcriptional terminators in bacterial genome (WebGeSTer DB) (Reference: Mitra A. et al. 2011.
Nucl. Acids Res.39(Database issue):D129-35).
RibEx: Riboswitch Explorer - scans <40kb DNA for potential genes (which are linked to BLASTP) and several hundred regulatory elements, including riboswitches. If you click on the "search for attenuators" it finds terminators and antiterminators. It presents the capculated genes and perits BLAST analysis at NCBI (Reference: C. Abreu-Goodger & E. Merino. 2005. Nucl. Acids Res. 33: W690-W692).
tRNAs: tRNAscan-SE- (Univerisity of California at San Diego, U.S.A,) is incredibly sensitive & also provides secondary structure diagrams of the tRNA molecules. Alternatively use ARAGORN (Reference: Laslett, D. & Canback. 2004. Nucleic Acids Research 32:11-16).
LTR_Finder - is an efficient program for finding full-length LTR retrotranspsons in genome sequences. The size of input file is now limited to 50MB (Reference: Z. Xu & H. Wang. 2007. Nucl. Acids Res.35(Web Server issue): W265-W268).
RTAnalyzer - finds retrotransposons and detects L1 retrotransposition signatures (Reference: J-F. Lucier et al. 2007. Nucl. Acids Res. 35(Web Server issue):W269-W274
FancyGene - is a fast and user-friendly web-based tool for producing images of one or more genes directly on the corresponding genomic locus. Starting from a variety of input formats, FancyGene rebuilds the basic components of a gene (UTRs, intron, exons). Once the initial representation is obtained, the user can superimpose additional features—such as protein domains and/or a variety of biological markers—in specific positions. (Reference: D. Rambaldi & F.D. Ciccarelli. 2009. Bioinformatics 25: 2281-2282).
SLEP is a pipeline for predicting the localization of bacterial proteins starting from genome sequences. It combines the results of several tools: Glimmer, TMHMM, PRODIV-TMHMM, LipoP, PSortB.
Ori-Finder finds oriCs in bacterial genomes based on an integrated method comprising the analysis of base composition asymmetry using the Z-curve method, distribution of DnaA boxes, and the occurrence of genes frequently close to oriCs. ( Reference: F. Gao & C.-T. Zhang. 2008. BMC Bioinformatics. 9:79).
MG-RAST (Metagenome Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating metagenome samples. It provides annotation of sequence fragments, their phylogenetic classification and an initial metabolic reconstruction. The service also provides means for comparing phylogenetic classifications and metabolic reconstructions of metagenomes (Reference: F. Meyer et al. 2008. BMC Bioinformatics 9: 386).
Correcting genome annotations:
gbk2tbl (Andre Villegas, Public Health Agency of Canada) - One of the problems with GenBank is that scientists do not update their submission data nor correct errors. In part this is due to laziness; but is also due to the fact that GenBank is, in most cases, unwilling to accept a new version of the Sequin file. Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank but, from my perspective, it is not easy to use. Gbk2tbl will generate a five-column table of the genome features, which can be easily edited in Notepad.
CGView Server - is a comparative genomics tool for circular genomes that allows sequence feature information to be visualized in the context of sequence analysis results. A genome sequence is supplied to the program in FASTA, GenBank, EMBL or raw format. Up to three comparison sequences (or sequence sets) in FASTA format can also be submitted. The CGView Server uses BLAST to compare the genome sequence to the comparison sequences, and then converts the results and any available feature information (from the GenBank, EMBL or optional GFF file) or analysis information (from an optional GFF file) into a high-quality graphical map showing the entire genome sequence, or a zoomed view of a region of interest. Several options are available for specifying how the BLAST comparisons are conducted, and for controlling how results are displayed.(Reference: Grant JR & Stothard P. 2008. Nucleic Acids Res. 36(Web Server issue): W181-184)
Jena Prokaryotic Genome Viewer (JPGV) - from a GenBank flatfile (*.gbk) generates linear or circular plots; including if desired GC content, GC skew, purine excess and keto excess can be displayed. Also allows BLAST analysis against related genomes. Requires free registration.
GenomeVx - makes editable, publication-quality, maps of mitochondrial and chloroplast genomes and of large plasmids. These maps show the location of genes and chromosomal features as well as a position scale. The program takes as input either raw feature positions or GenBank records. In the latter case, features are automatically extracted and colored, an example of which is given. Output is in the Adobe Portable Document Format (PDF) and can be edited by programs such as Adobe Illustrator.(Reference: G. Conant & K. Woolfe. 2008. Bioinformatics 24:861-862)
DNAPlotter - is an interactive Java application for generating circular and linear representations of genomes. Making use of the Artemis libraries to provide a user-friendly method of loading in sequence files (EMBL, GenBank, GFF) as well as data from relational databases, it filters features of interest to display on separate user-definable tracks. It can be used to produce publication quality images for papers or web pages.(Reference: Carver, T. et al. 2008. Bioinformatics 25:119-120)
GeneWiz (Center for Biological Sequence Analysis, Danish Technical University) produces linear or circular genome altases such as the one below. They have ready name ones for most bacteria, but by uploading custom data in GenBank format (.gbk) one can make one's own diagram showing the genetic and physical properties of your genome.
OrganellarGenomeDRAW - is a suite of software tools that enable users to create high-quality visualrepresentations of both circular and linear annotated genome sequences provided as GenBank files oraccession numbers. Although all types of DNA sequences are accepted as input, the software has beenspecifically optimized to properly depict features of organellar genomes. A recent extension facilitates theplotting of quantitative gene expression data, such as transcript or protein abundance data, directly ontothe genome map.
MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands (Reference: H-Y. Ou et al. Nucl. Acids Res. 35 Web Server issue W97-W104)
Phage_Finder - was created to identify prophage regions in completed bacterial genomes. Using a test dataset of 42 bacterial genomes whose prophages have been manually identified, Phage_Finder found 91% of the regions, resulting in 7% false positive and 9% false negative prophages. A search of 302 complete bacterial genomes predicted 403 putative prophage regions, accounting for 2.7% of the total bacterial DNA. Analysis of the 285 putative attachment sites revealed tRNAs are targets for integration slightly more frequently (33%) than intergenic (31%) or intragenic (28%) regions, while tmRNAs were targeted in 8% of the regions. (Reference: D.E. Fouts. 2006. Nucleic Acids Res. 34: 5839–5851).
Prophinder - is the tool used for detecting prophages in bacterial genomes. Select a GenBank formatted file.
PHAST (PHAge Search Tool) - is designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids. It accepts either raw DNA sequence data or partially annotated GenBank formatted data and rapidly performs a number of database comparisons as well as phage “cornerstone” feature identification steps to locate, annotate and display prophage sequences and prophage features. Relative to other prophage identification tools, PHAST is up to 40 times faster and up to 15% more sensitive. It is also able to process and annotate both raw DNA sequence data and Genbank files, provide richly annotated tables on prophage features and prophage “quality” and distinguish between intact and incomplete prophage. PHAST also generates downloadable, high quality, interactive graphics that display all identified prophage components in both circular and linear genomic views.Furthermore, tests indicate that PHAST is as accurate or slightly more accurate than all available phage finding tools, with sensitivity of 85.4% and positive predictive value of 94.2%. (Reference: Zhou, Y. et al. 2011. Nucl. Acids Res. 39(suppl 2): W347-W352).
IslandViewer - integrates two sequence composition GI prediction methods SIGI-HMM and IslandPath-DIMOB, and a single comparative GI prediction method IslandPick (Reference: Langille et al. 2008. BMC Bioinformatics 9: 329).
PAIDB (PAthogenicity Island DataBase ) has made an effort to collect known PAIs and to detect the potential PAI regions in the prokaryotic complete genomes. Pathogenicity islands (PAIs) are distinct genetic elements of pathogens encoding various virulence factors. (Reference: Yoon SH et al. 2007. Nucleic Acids Res. 35 (Database Issue): D395-D400).
GeneDesign - is an excellent resource for designing synthetic genes. It includes tools for codon optimization and removal of restriction sites (Reference: Richarson, S.M. et al. 2006. Genome Research 16:550-556)
Orphelia - Orphelia is a metagenomic ORF finding tool for the prediction of protein coding genes in short, environmental DNA sequences with unknown phylogenetic origin. Orphelia is based on a two-stage machine learning approach that was recently introduced by our group. After the initial extraction of ORFs, linear discriminants are used to extract features from those ORFs. Subsequently, an artificial neural network combines the features and computes a gene probability for each ORF in a fragment. A greedy strategy computes a likely combination of high scoring ORFs with an overlap constraint. (Reference: K.J. Hoff et al. 2009. Nucl. Acids Res. 37(Web Server issue:W101-W105)