GENOMICS

Many of the tools that one needs for the analysis of genomes can be found in the DNA Sequence Analysis section. Here we have unique tools for genomic analysis which do not fit easily in that section.

DNA sequencing:

red_bullet.gif (914 bytes) The DNA Sequence Quality Machine at IFOM - Phred (The FIRC Institute for Molecular Oncology, Italy)  - provides base calling, chromatogram display and high quality sequence region evaluation and presentation for up to five sequences simultaneously.  For further information on Phred see here.

red_bullet.gif (914 bytes) Sequence assembly - you don't need your own contig assembly program when you can use:

CAP online at Infobiogen (France)
CAP3 (PBIL, France),
CAP EST Assembler (Istituto FIRC di Oncologia Molecolare, Italy)
Divide-and-Conquer Multiple Sequence Alignment (Universitat Bielefeld, Germany)

red_bullet.gif (914 bytes) Sequencing errors - if your DNA sequence doesn't match the expected protein sequence you can check for errors at ERR_WISE or Wise2: Intelligent algorithms for DNA searches(EBI, United Kingdom) or  SEQERR  - Detection of Frameshift Errors in Coding Regions

In-silico.com (Dr. Joseba Bikandi & co-workers, Faculty of Pharmacy, in the University of the Basque Country) - allows in silico experiments including theoretical PCR amplification, AFLP-PCR , restriction analysis and pulsed field gel electrophoresis [PFGE] with bacterial & archael genomes found in the public database.

Genome comparisons:

  GeneOrder 2.0 (D. Seto,  Bioinformatics & Computational Biology, George Mason Univ., U.S.A.)  is ideal for comparing small GenBank genomes (up to 0.25 Mb), while GeneOrder 3.0 extends the limits to approx. 2.0Mb. Each gene from the Query sequence is compared to all of the genes from the Reference database using BLASTP. There are two display formats: graphical and tabular. Currently the graph is an applet and must be saved as a "SCREEN SHOT".

CoreGenes  (D. Seto,  Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is designed to analyze two to five genomes simultaneously, generating a table of related genes - orthologs and putative orthologs. These entries are linked to their GenBank data.  It has a limit of 0.35 Mb, while the newer version CoreGenes 2.0 extends the limit to  approx. 2.0Mb. If your data is not present in GenBank use this site.

CoreGenes 3 (D. Seto & P. Mahadevan, Bioinformatics & Computational Biology, George Mason Univ., U.S.A) - tallies the total number of genes in common between the two genomes being compared; displays the percent value of genes in common with a specific genome; determines the unique genes contained in a pair of proteomes

  WebACT - this is the web version of ACT (Artemis Comparison Tool) a DNA sequence comparison viewer based on Artemis (Reference: T.J. Carver et al. Bioinformatics 21: 3422 - 3423).   Visit the database page of EMBL-EBI and select EMBL and "Standard Query Form"  to determine the EMBL accession number for the sequence you are interested in.

 WebGMAP - is a public web service for annotating and mapping individual cDNA sequences to the genomes of many eukaryote species, currently including Arabidopsis thaliana, Chlamydomonas reinhardtii, Glycine max, Oryza sativa, Physcomitrella patens and Populus trichocarpa. (Reference: C. Liang et al. 2009. Nucl. Acids Res. 37(Web Server issue):W77-W83)

Genome annotation and/or visualization:

BASys Bacterial Annotation Tool - this incredible tool supports automated, in-depth annotation of bacterial genomic sequences. It accepts raw DNA sequence data and an optional list of gene identification information (Glimmer) and provides extensive textual annotation and hyperlinked image output. BASys uses >30 programs to determine 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3D structure, reactions and pathways. (Reference: G.H. Van Domselaar et al. 2005. Nucl. Acids Res. 33(Web Server issue):W455-W459).

red_bullet.gif (914 bytes) ORF (Groningen Biomolecular Sciences and Biotechnology Institute, Haren, the Netherlands) - offers one of the choice of Glimmer, ZCurve or GeneMark predictions coupled with GenBank or Fasta-formatted output. Works very well and quickly with phage-sized genomes.

red_bullet.gif (914 bytes) BAGEL (Groningen Biomolecular Sciences and Biotechnology Institute, Haren, the Netherlands) - will determine from an existing or non submitted GenBank file the presence of bacteriocins based on a database containing information of known bacteriocins and adjacent genes involved in bacteriocin activity.

MICheck (MIcrobial genome Checker) - enables rapid verification of sets of annotated genes and frameshifts in previously published bacterial genomes, or genomes for which the user has a *.gbk file. This tool can be seen as a preliminary step before the functional re-annotation step to check quickly for missing or wrongly annotated genes. It worked nicely with phage genomes from 43-135kb. (Reference: S. Cruveiller et al. 2005. Nucl. Acids Res. 33: W471- W479).

RibEx: Riboswitch Explorer - scans <40kb DNA for potential genes (which are linked to BLASTP) and several hundred regulatory elements, including riboswitches. If you click on the "search for attenuators" it finds terminators and antiterminators. It presents the capculated genes and perits BLAST analysis at NCBI (Reference: C. Abreu-Goodger & E. Merino. 2005. Nucl. Acids Res. 33: W690-W692).

TransTerm (Michael Nuhn, Nano+Bio-Center) - TransTerm searches for rho-independent terminators in the vicinity of annotated genes. This TIGR program can be accessed online in two ways. If you have the genome in GenBank format to use this program since it will only look for terminators in the vicinity of the annotated genes. If the genome has not been annotated use this site. The latter site combines Glimmer and RBSfinder with TransTerm.

red_bullet.gif (914 bytes) tRNAs: tRNAscan-SE- (Univerisity of California at San Diego, U.S.A,) and FAStRNA - (N. El-Mabrouck, Pasteur Institute, Paris, France). The former site is incredibly sensitive & also provides secondary structure  diagrams of the tRNA molecules. Alternatively use ARAGORN (Reference: Laslett, D. & Canback. 2004. Nucleic Acids Research 32:11-16).
Test sequences.

red_bullet.gif (914 bytes) CRISPRfinder  Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) present a curious repeat structure found in many prokaryotic genomes. They show characteristics of both tandem and interspaced repeats. (Reference: I. Grissa et al. 2007. Nucl. Acids Res. 35(Web Server issue): W52-W57).

red_bullet.gif (914 bytes) GenomeVx - makes editable, publication-quality, maps of mitochondrial and chloroplast genomes and of large plasmids. These maps show the location of genes and chromosomal features as well as a position scale. The program takes as input either raw feature positions or GenBank records. In the latter case, features are automatically extracted and colored, an example of which is given. Output is in the Adobe Portable Document Format (PDF) and can be edited by programs such as Adobe Illustrator.(Reference: G. Conant & K. Woolfe. 2008. Bioinformatics 24:861-862)

red_bullet.gif (914 bytes) LTR_Finder - is an efficient program for finding full-length LTR retrotranspsons in genome sequences. The size of input file is now limited to 50MB (Reference: Z. Xu & H. Wang. 2007. Nucl. Acids Res.35(Web Server issue): W265-W268).
red_bullet.gif (914 bytes) RTAnalyzer - finds retrotransposons and detects L1 retrotransposition signatures (Reference: J-F. Lucier et al. 2007. Nucl. Acids Res. 35(Web Server issue):W269-W274

red_bullet.gif (914 bytes) FancyGene - is a fast and user-friendly web-based tool for producing images of one or more genes directly on the corresponding genomic locus. Starting from a variety of input formats, FancyGene rebuilds the basic components of a gene (UTRs, intron, exons). Once the initial representation is obtained, the user can superimpose additional features—such as protein domains and/or a variety of biological markers—in specific positions. (Reference: D. Rambaldi & F.D. Ciccarelli. 2009. Bioinformatics 25: 2281-2282).

red_bullet.gif (914 bytes) DNAPlotter - is an interactive Java application for generating circular and linear representations of genomes. Making use of the Artemis libraries to provide a user-friendly method of loading in sequence files (EMBL, GenBank, GFF) as well as data from relational databases, it filters features of interest to display on separate user-definable tracks. It can be used to produce publication quality images for papers or web pages.(Reference: Carver, T. et al. 2008. Bioinformatics 25:119-120)

 Genomic Islands:

red_bullet.gif (914 bytes) MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands (Reference: H-Y. Ou et al. Nucl. Acids Res. 35 Web Server issue W97-W104)

Prophage Finder - this tool predicts potential prophage loci in prokaryotic genome sequences.  However, it does not make any predictions as to whether the identified prophage is functional and it is also important to note the identified prophage region will most likely not represent the entire prophage. (Reference: Bose, M. & Barber, R. 2006.  In Silico Biol. 6: 0020).

 Phage_Finder - was created to identify prophage regions in completed bacterial genomes. Using a test dataset of 42 bacterial genomes whose prophages have been manually identified, Phage_Finder found 91% of the regions, resulting in 7% false positive and 9% false negative prophages. A search of 302 complete bacterial genomes predicted 403 putative prophage regions, accounting for 2.7% of the total bacterial DNA. Analysis of the 285 putative attachment sites revealed tRNAs are targets for integration slightly more frequently (33%) than intergenic (31%) or intragenic (28%) regions, while tmRNAs were targeted in 8% of the regions. (Reference: D.E. Fouts. 2006. Nucleic Acids Res. 34: 5839–5851).

 Prophinder

 IslandViewer - integrates two sequence composition GI prediction methods SIGI-HMM and IslandPath-DIMOB, and a single comparative GI prediction method IslandPick (Reference: M.G.I. Langille et al. 2008. BMC Bioinformatics 9: 329).

 Synthetic genes:

red_bullet.gif (914 bytes)  GeneDesign - is an excellent resource for designing synthetic genes. It includes tools for codon optimization and removal of restriction sites (Reference: Richarson, S.M. et al. 2006. Genome Research 16:550-556)

 Metagenomics:

red_bullet.gif (914 bytes) Orphelia  - Orphelia is a metagenomic ORF finding tool for the prediction of protein coding genes in short, environmental DNA sequences with unknown phylogenetic origin. Orphelia is based on a two-stage machine learning approach that was recently introduced by our group. After the initial extraction of ORFs, linear discriminants are used to extract features from those ORFs. Subsequently, an artificial neural network combines the features and computes a gene probability for each ORF in a fragment. A greedy strategy computes a likely combination of high scoring ORFs with an overlap constraint.  (Reference: K.J. Hoff et al. 2009. Nucl. Acids Res. 37(Web Server issue:W101-W105)