Genomics

GENOMICS

N.B.Many of the tools that one needs for the analysis of genomes can be found in the DNA Sequence Analysis section. Here we have unique tools for genomic analysis which do not fit easily in that section. Two excellent internet resources are at the Danish Technical University, specifically DTU Health Tech and Center for Genomic Epidemiology.

1.   DNA sequencing
2.   Sequencing errors
3.   Genome annotation
4.   Correcting genome annotations
5.   Specialized annotation - general (inteins, plasmids, typing, vaccine candidates)
6.   Two-component and other regulatory proteins
7.   Orthologous genes/proteins
8.   Specialized annotation - antibiotic resistance
9.   Specialized annotation - CRISPR
10.  Specialized annotation - virulence determinants
11. Specialized annotation - Genomic Islands
12. Genome comparisons and synteny
13.  Phylogeny (AAI and ANI)
14. Genome visualization
15. Synthetic genes
16. Metagenomics
17. Meta sites
18. Naming your bacteriophage
19. Other useful phage resources

DNA sequencing:

DNA Sequence Quality - Phred - provides base calling, chromatogram display and high quality sequence region evaluation and presentation for up to five sequences simultaneously.

Sequence assembly - you don't need your own contig assembly program when you can use:

Galaxy for Genome Assembly - The Genome Assembly Workbench is a comprehensive set of analysis tools and consolidated workflows to assist in Genome Assembly. The workbench is based on the Galaxy framework, which guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses independent of command-line knowledge. requires free registration.

CAP3 (PBIL, France ), (Reference: Huang,X. & Madan A. 1999. Genome Res.9: 868-877).

MicroScope web site (hosted at Genoscope), provides an environment for expert annotation and comparative genomics. Genome project: Annotation and comparative analyses of finished or draft genome sequences. For pre-annotated sequences, they only integrate annotations from NCBI RefSeq complete genome section. Metagenome project: Annotation and comparative analyses of assembled metagenomic sequences. Currently, they are able to integrate datasets below 20 Mb of contigs per bin.

EGassember - aligns and merges sequence fragments resulting from shotgun sequencing or gene transcripts (EST) fragments in order to reconstruct the original segment or gene (Reference: A. Masoudi-Nejad et al. 2006. Nucl. Acids Res. 34: W459-462).

NanoPipe - was developed in consideration of the specifics of the MinION sequencing technologies, providing accordingly adjusted alignment parameters. The range of the target species/sequences for the alignment is not limited, and the descriptive usage page of NanoPipe helps a user to succeed with NanoPipe analysis. The results contain alignment statistics, consensus sequence, polymorphisms data, and visualization of the alignment. (Reference: Shabardina V et al. (2019) Gigascience 8(2). pii: giy169).

COV2HTML: a visualization and analysis tool of bacterial next generation sequencing (NGS) data for postgenomics life scientists - allows performing both coverage visualization and analysis of NGS alignments performed on prokaryotic organisms (bacteria and phages). It combines two processes: a tool that converts the huge NGS mapping or coverage files into light specific coverage files containing information on genetic elements; and a visualization interface allowing a real-time analysis of data with optional integration of statistical results. (Reference: Monot M. et al. 2014. OMICS 18(3): 184-95).

PhageTerm - is a fast and user-friendly software package which can be used to determine bacteriophage termini and packaging mode from randomly fragmented NGS data. It is part of the Galaxy package, and can be found in the "NGS: Mapping" directory. Ideal is you want an automated answer. (Reference: Garneau JR, et al. 2017. Sci Rep. 7(1):8292).

Sequencing errors: - if your DNA sequence doesn't match the expected protein sequence you can check for errors using BLASTx which compares a protein sequence to a genomic DNA sequence, allowing for introns and frameshifting errors. Other programs include:

FrameD - is a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. (Reference: T. Schliex et al. 2003. Nucl. Acids Res. 31: 3738-3741)

PATH: protein back-translation and alignment - addresses the problem of finding distant protein homologies where the divergence is the result of frameshift mutations and substitutions. Given two input protein sequences, the method implicitly aligns all the possible pairs of DNA sequences that encode them, by manipulating memory-efficient graph representations of the complete set of putative DNA sequences for each protein. (Reference: Gîrdea M et al. 2010. Algorithms for Molecular Biology 5:)

In-silico.com (Dr. Joseba Bikandi & co-workers, Faculty of Pharmacy, in the University of the Basque Country) - allows in silico experiments including theoretical PCR amplification, AFLP-PCR , restriction analysis and pulsed field gel electrophoresis [PFGE] with bacterial & archael genomes found in the public database.

Genome annotation:

DFAST - is a very quick prokaryotic genome annotation pipeline providing rich information on pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. DFAST also supports genome submission to public sequence databases (Reference: Tanizawa Y et al. (2018) Bioinformatics. 34(6): 1037-1039). One of my favourite annotation pipelines due to its speed and simplicity.

Bakta web server - is a user-friendly web interface for conducting and visualizing annotations using Bakta without requiring command line expertise or local computing resources. Key features include interactive visualizations through circular genome plots, linear genome browsers, and searchable data tables facilitating the interpretation of complex annotation results. The web server generates standard bioinformatics outputs (GFF3, GenBank, EMBL) and annotates diverse genomic features, including coding sequences, non-coding RNAs, small open reading frames (sORFs) (Reference: Beyvers S et al. (2025) Nucleic Acids Research53(W1): W51–W56). Also available at Galaxy.eu. Requires registration.

pharokka - provides annotations in a fast, scalable and consistent fashion. Pharokka identifies predicted coding sequences (CDS), transfer RNAs (tRNAs), transfer-messenger RNAs (tmRNAs) and clustered regularly interspaced short palindromic repeats (CRISPRs), providing functional annotation for CDS using the PHROGs database (Reference: Bouras G et al. (2023) Bioinformatics, 39(1): btac776). Also available at GoogleColab. Requires registration.

Proksee - provides users with a powerful, easy-to-use, and feature-rich system for assembling, annotating, analysing, and visualizing bacterial genomes. Proksee accepts Illumina sequence reads as compressed FASTQ files or pre-assembled contigs in raw, FASTA, or GenBank format. Alternatively, users can supply a GenBank accession or a previously generated Proksee map in JSON format. Proksee then performs assembly (for raw sequence data), generates a graphical map, and provides an interface for customizing the map and launching further analysis jobs. Notable features of Proksee include unique and informative assembly metrics provided via a custom reference database of assemblies; a deeply integrated high-performance genome browser for viewing and comparing analysis results at individual base resolution (developed specifically for Proksee); an ever-growing list of embedded analysis tools whose results can be seamlessly added to the map or searched and explored in other formats; and the option to export graphical maps, analysis results, and log files for data sharing and research reproducibility (Reference: Grant JR et al (2023) Nucleic Acids Res. 51(W1): W484-W492.)

RAST (Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating bacterial and archaeal genomes. It provides high quality genome annotations for these genomes across the whole phylogenetic tree. Requires registration. (Reference: Aziz, RK et al. 2008. BMC Genomics 9:75.).

BV-BRC (Bacterial and Viral Bioinformatics Resource Center) - It provides public access to computational platforms and analysis tools that enable the collecting and integration of genomics and related research data relevant to infectious diseases, pathogens, and their interaction with host organisms (Reference: Olson RD et al. (2022) Nucleic Acids Res 51(D1): D678–D689).

BASys2 (Bacterial Annotation System 2.0) is a freely available powerful web server for comprehensive bacterial genome annotation. It identifies all gene types (protein-coding, tRNA, rRNA, etc.) and generates up to 62 annotation fields per gene using over 30 tools and 10 databases. The interactive genome viewer provides detailed, multi-resolution visualizations and clickable gene cards, while also supporting metabolome annotations and 3D protein structure visualizations. Annotations include structural, functional, and statistical data, with results available for download in JSON and GenBank formats. (Reference: Poelzer J et al. 2005. Nucl. Acids Res. 53(W1): W57-W67).

MicroScope - (CEA, Institut de Génomique - Genoscope, France) is a microbial genome annotation & analysis platform which provides access to a wide range of tools including COG analysis, comparative genomics ... (Reference: Vallenet D et al. (2017) Nucleic Acids Res. 45(D1): D517-D528). Requires registration.

MAKER Web Annotation Service (MWAS) is an easily configurable web-accesible genome annotation pipeline. It's purpose is to allow research groups with small to intermediate amounts of eukaryotic and prokaryotic genome sequence (i.e. BAC clones, small whole genomes, preliminary sequencing data, etc.) to independently annotate and analyse their data and produce output that can be loaded into a genome database. (Reference: Holt, C. & Yandell, M. 2011. BMC Bioinformatics 12:491).

MITOS2 (part of Galaxy,org) - is a pipeline designed to provide consistent and high quality de novo annotation of Metazoan mitochondrial genomes sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. (Reference: M. Bernt et al. 2013. Molecular Phylogenetics & Evolution 69:313-319).

GenSAS - Genome Sequence Annotation Server - provides a one-stop website with a single graphical interface for running multiple structural and functional annotation tools, enabling visualization and manual curation of genome sequences. Users can upload sequences into their account and run gene prediction programs, protein homology searches, map ESTs, identify repeats, ORFs and SSRs with custom parameter settings. Each analysis is displayed on separate tracks of the graphical interface with custom editabe tracks to select final annotation of features and create gff3 files for upload to genome browsers such as GBrowse. Additional programs can be easily added using this Drupal based software.

FLAN (FLu ANnotation) is an NCBI web server for genome annotation of influenza virus is a tool for user-provided influenza A virus or influenza B virus sequences. It can validate and predict protein sequences encoded by an input flu sequence. (Reference: Y. Bao et al. 2007. Nucleic Acids Res. Web Server issue) 35: W280-W284.)

Genome Annotation Transfer Utility (GATU) annotates a genome based on a very closely related reference genome. The proteins/mature peptides of the reference genome are BLASTed against the genome to be annotated in order to find the genes/mature peptides in the genome to be annotated (Reference: T. Tcherepanov et al. 2006. BMC Genomics 7:150.)

BioGPS(The Scripps Research Institute, USA) - is a one-stop gene annotation portal that emphasizes user-customizability and community-extensibility It is a customizable gene annotation portal and a complete resource for learning about gene and protein function.

BAGEL (Groningen Biomolecular Sciences and Biotechnology Institute, Haren, the Netherlands) - will determine from an existing or non submitted GenBank file the presence of bacteriocins based on a database containing information of known bacteriocins and adjacent genes involved in bacteriocin activity. See.LABioicin if you are interested in the topic of Lactic Acid Bacteria (LAB) and its bacteriocins.

tRNAs: tRNAscan-SE - is incredibly sensitive & also provides secondary structure diagrams of the tRNA molecules (Reference: Schattner, P. et al. 2005. Nucleic Acids Res. 33: W686-689). Alternatively use ARAGORN (Reference: Laslett, D. & Canback. 2004. Nucleic Acids Research 32:11-16).
Test sequences.

MG-RAST (Metagenome Rapid Annotation using Subsystem Technology) is a fully-automated service for annotating metagenome samples. It provides annotation of sequence fragments, their phylogenetic classification and an initial metabolic reconstruction. The service also provides means for comparing phylogenetic classifications and metabolic reconstructions of metagenomes (Reference: F. Meyer et al. 2008. BMC Bioinformatics 9: 386).

The following programs can be used to prediction phage proteins:

PVPred (Reference: Ding H et al (2014) Mol Biosyst 10(8): 2229-2235).
PHPred (Reference: Ding H (2016) Computers Biol Med 71: 156–161).

Chromosome replication origin:

Ori-Finder - is a useful platform for the identification and analysis of replication origins (oriCs) in the bacterial genomes. (Reference: Luo H et al. (2019) Brief Bioinform 20(4): 1114-1124). Please note that these tools have been used to create DoriC - a database of replication origins in prokaryotic genomes including chromosomes and plasmids. (Reference: Luo H & Gao F (2019) Nucleic Acids Res. 47(D1): D74-D77).

Correcting genome annotations:

One of the problems with GenBank is that scientists do not update their submission data nor correct errors. In part this is due to laziness; but is also due to the fact that GenBank is, in most cases, unwilling to accept a new version of the Sequin file. Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank but, from my perspective, it is not easy to use. The only online program is GenBank 2 Sequin which generates not only a Sequin file (*.sqn), but also a five-column "Annotation Table" (*.tbl). This together with the fasta-formatted DNA sequence can be submitted to GenBank by Email (gb-admin@ncbi.nlm.nih.gov). In its absence I recommend the perl script gbf2tbl.pl available for downloading here.

Specialized annotation - general

PlasmidFinder 2 - identifies plasmids in total or partial sequenced isolates of bacteria. The method uses BLAST for identification of replicons of plasmids belonging to the major incompatibility (Inc) groups of Enterobacteriaceae. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. See also pMLST 2.0 (Reference: Carattoli A et al. 2014. Antimicrob. Agents Chemother. 58: 3895-903)

HostPhinder 1.1 (Danish Technical University)- identifies the bacterial host of a query phage genome based on its genomic similarity to a database of phage genomes with known host.

SpeciesFinder 2.0(Danish Technical University) - predicts the species of bacteria from pre-assembled, complete or partial genomes, and short sequence reads. The prediction is based on the 16S rRNA gene.

CSI Phylogeny 1.4 (Call SNPs & Infer Phylogeny; Danish Technical University) - calls SNPs, filters the SNPs, does site validation and infers a phylogeny based on the concatenated alignment of the high quality* SNPs. (Reference: Kaas, R.S. et al. PLoS ONE 2014; 9: e104984.)

KmerFinder 3.2 (Danish Technical University) – predicts the species of bacteria from pre-assembled, complete or partial genomes, and short sequence reads. The prediction is based on the number of co-occurring k-mers (substrings of k nucleotides in DNA sequence data, in this case 16-mers) between the genomes of reference bacteria in a database and the genome provided by the user. (Reference: Hasman H et al. 2013. J Clin Microbiol. 52:139-146)

VIOLIN - Vaccine Investigation and Online Information Network - allows easy curation, comparison and analysis of vaccine-related research data across various human pathogens VIOLIN is expected to become a centralized source of vaccine information and to provide investigators in basic and clinical sciences with curated data and bioinformatics tools for vaccine research and development. VBLAST: Customized BLAST Search for Vaccine Research allows various search strategies against against 77 genomes of 34 pathogens. (Reference: He, Y. et al. 2014. Nucleic Acids Res. 42 (Database issue):D1124-32).

MLST 2.0 (MultiLocus Sequence Typing) - currently only works with assembled genomes and contigs (Reference: Larsen MV et al. 2012. J. Clin. Micobiol. 50: 1355-1361).

BacWGSTdb- incorporates extensive resources for bacterial genome sequencing data and the corresponding metadata, combined with specialized bioinformatics tools that enable the systematic characterization of the bacterial isolates recovered from infections: (i) the integration of the core genome multi-locus sequence typing (cgMLST) approach; (ii) the addition of a multiple genome analysis module that can process dozens of user uploaded sequences in a batch mode; (iii) a new source tracking module for comparing plasmid sequences to those deposited in the public databases; and (iv) the number of species encompassed in BacWGSTdb 2.0 has increased from 9 to 20. (Reference: Feng Y et al (2021) Nucleic Acids Research. 2016; 44(D1): D682-D687).

InBase, The Intein Database and Registry (legacy hosted by Hideo Iwai lab) Protein splicing is defined as the excision of an intervening protein sequence (the INTEIN) from a protein precursor and the concomitant ligation of the flanking protein fragments (the EXTEINS) to form a mature extein host protein and the free intein (Perler 1994). Protein splicing results in a native peptide bond between the ligated exteins. This is a database site which permits BLAST analysis. (Reference: Perler, F.B. 2002. Nucleic Acids Res. 30: 383-384).

Two-component and other regulatory proteins:

P2RP (Predicted Prokaryotic Regulatory Proteins) - users can input amino acid or genomic DNA sequences, and predicted proteins therein are scanned for the possession of DNA-binding domains and/or two-component system domains. RPs identified in this manner are categorised into families, unambiguously annotated. (Reference: Barakat M, et al. 2013. BMC Genomics 14:269).

P2CS (Prokaryotic 2-Component Systems) is a comprehensive resource for the analysis of Prokaryotic Two-Component Systems (TCSs). TCSs are comprised of a receptor histidine kinase (HK) and a partner response regulator (RR) and control important prokaryotic behaviors. It can be searched using BLASTP. (Reference: P. Ortet et al. 2015. Nucl. Acids Res. 43 (D1): D536-D541).

Orthologous genes/proteins

COG analysis - Clusters of Orthologous Groups - COG protein database was generated by comparing predicted and known proteins in all completely sequenced microbial genomes to infer sets of orthologs. Each COG consists of a group of proteins found to be orthologous across at least three lineages and likely corresponds to an ancient conserved domain. Sites which offer this analysis include:

RAST (Reference: Aziz RK et al. 2008. BMC Genomics 9:75), and BASys (Bacterial Annotation System; Reference: Van Domselaar GH et al. 2005. Nucleic Acids Res. 33(Web Server issue):W455-459.) and JGI IMG (Integrated Microbial Genomes; Reference: Markowitz VM et al. 2014. Nucl. Acids Res. 42: D560-D567. )

Other sites:

EggNOG - A database of orthologous groups and functional annotation that derives Nonsupervised Orthologous Groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. (Reference: Powell S et al. 2014. Nucleic Acids Res. 42 (D1): D231-D239

KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes by BLAST or GHOST comparisons against the manually curated KEGG GENES database. The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways. (Reference: Moriya Y et al. 2007. Nucleic Acids Res. 35(Web Server issue):W182-185).

PHROGS - (PHage Remote Orthologous GroupS) - is a library of 38,880 viral protein families generated using a new clustering approach based on remote homology detection by HMM profile-profile comparisons. (Reference: Tersian P et al. 2021. NAR Genom Bioinform. 3(3): lqab067).

Specialized annotation - antibiotic resistance.

ResFinder 4.1 (Danish Technical Univcersity) - uses BLAST for identification of acquired antimicrobial resistance genes in whole-genome data. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. Tested with 1411 different resistance genes with 100% identity. (Reference: Zankari E et al. 2012. J Antimicrob Chemother. 67:2640-2644)

ResFinderFG 2.0(Danish Technical University) - identifies a resistance phenotype based on a functional metagenomic antibiotic resistance determinants database.

CARD (The Comprehensive Antibiotic Resistance Database) - a rigorously curated collection of known resistance determinants and associated antibiotics, organized by the Antibiotic Resistance Ontology (ARO) and AMR gene detection models (Reference: Jia, B. et al. 2017. Nucleic Acids Research, 45: D566-573).

BacMet (Antibacterial Biocide & Metal Resistance Genes Database) - a database of biocide and metal resistance genes with highly reliable content. In BacMet version 1.1, the experimentally confirmed database contains 704 resistance genes, whereas the predicted database contains 40,556 resistance genes (Reference: Pal, C. et al. 2014. Nucleic Acids Research, 42: D737-743).

Specialized annotation - CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) and anti-CRISPR:

CRISPRfinder - enables the easy detection of CRISPRs in locally-produced data and consultation of CRISPRs present in the database. It also gives information on the presence of CRISPR-associated (cas) genes when they have been annotated as such. . (Reference: I. Grissa et al. 2007. Nucl. Acids Res. 35 (Web Server issue): W52-W57).

CRISPRmap -provides a quick and detailed insight into repeat conservation and diversity of both bacterial and archaeal systems. It comprises the largest dataset of CRISPRs to date and enables comprehensive independent clustering analyses to determine conserved sequence families, potential structure motifs for endoribonucleases, and evolutionary relationships. (Reference: S.J. Lange et al. 2013. Nucleic Acids Research, 41: 8034-8044).

CRISPI : a CRISPR Interactive database - includes a complete repertory of associated CRISPR-associated genes (CAS). A user-friendly web interface with many graphical tools and functions allows users to extract results, find CRISPR in personal sequences or calculate sequence similarity with spacers.(Reference: Rousseau C et al. 2009. Bioinformatics. 25: 3317–3318).

CRISPRTarget - that predicts the most likely targets of CRISPR RNAs. This can be used to discover targets in newly sequenced genomic or metagenomic data. (Reference: Biswas A et al. 2013. RNA Biol. 10:817-827).

CRISPy-web - is an easy to use web tool based on CRISPy to design sgRNAs for any user-provided microbial genome. CRISPy-web allows researchers to interactively select a region of their genome of interest to scan for possible sgRNAs. After checks for potential off-target matches, the resulting sgRNA sequences are displayed graphically and can be exported to text files. (Reference: K. Blin et al. 2016. Synthetic and Systems Biotechnology 1(2): 118-121).

PaCRISPR - Anti-CRISPRs are widespread amongst bacteriophage and certain mobile genetic elements (such as transposons and prophage) and by inactivating the bacterial host’s CRISPR-Cas defence system, anti-CRISPRs promote bacteriophage infection and horizontal gene transfer. PaCRISPR accurately identifies anti-CRISPRs from protein datasets derived from genome and metagenome sequencing projects (Reference: Wang J et al. Nucleic Acids Research, 48: W348–W357).

Specialized annotation - virulence determinants: This is of particular interest to those working on bacteriophages for therapy

VirulenceFinder 2.0 (Danish Technical University) – identification of virulence genes. The method uses BLAST for identification of known virulence genes in Escherichia coli. The method is being extended to also include virulence genes for Enterococcus and Staphylococcus aureus. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms.

t3db the Toxin and Toxin Target Database - combines detailed toxin data with comprehensive toxin target information. The database currently houses 3,053 toxins which are linked to 1,670 corresponding toxin target records. Each toxin record (ToxCard) contains over 50 data fields and holds information such as chemical properties and descriptors, toxicity values, molecular and cellular interactions, and medical information. (Reference: Lim E et al. 2010. Nucleic Acids Res. 38(Database issue): D781-786).

VFDB - is an integrated and comprehensive database of virulence factors for bacterial pathogens (also including Chlamydia and Mycoplasma). (Reference: L.H. Chen et al. 2012. Nucleic Acids Res. 40(Database issue): D641-D645).

PAIDB (Pathogenicity Island Database) - Pathogenicity islands (PAIs) and resistance islands (REIs) are key to the evolution of pathogens and appear to play complimentary roles in the process of bacterial infection. While PAIs promote disease development, REIs give a fitness advantage to the host against multiple antimicrobial agents. An anncillary program, PAI Finder, identifies PAI-like regions or REI-like regions in a multi-sequence query. (Reference: S.H Yoon et al. 2015. Nucl. Acids Res. 43 (D1): D624-D630).

IslandViewer - includes a new interactive genome visualization tool, IslandPlot, and expanded virulence factor, antimicrobial resistance gene, and pathogen-associated gene annotations, as well as homologs of these genes in closely related genomes. Notably, incomplete genomes are accepted as input in IslandViewer 4, though they strongly urge users to use complete genomes whenever possible. (Reference: B.K. Dhillon et al. 2015. Nucl. Acids Res. 43 (W1): W104-W108).

Gypsy Database - an open editable database about the evolutionary relationship of viruses, mobile genetic elements (MGEs; Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants) and other genomic repeats. Equipped for BLAST and HMM searches. (Reference: Llorens, C et al. 2011. Nucl. Acids Res. 39(suppl 1): D70-D74).

PathogenFinder 1.1 (Danish Technical University)– Based on complete genomes from 513 bacteria annotated as human non-pathogens and 372 bacteria annotated as human pathogens, a database of protein families, which are either mainly associated with non-pathogens or with pathogens have been created. This database is then used for predicting the pathogenic potential of bacteria. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms. (Reference: Cosentino S et al. 2013. PLoS ONE 8: e77302)

TASmania - is bacterial Toxin-Antitoxin Systems database has mined over 41K assemblies of the EnsemblBacteria database for known and uncharacterized protein components of type I to IV TAS loci. (Reference: Akarsu H et al. (2024) PLoS Comput Biol 15(4): e1006946).

VirulentPred - is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non-annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. Version 2 has achieved 84.71% accuracy with the validation dataset and 85.18% on an independent test dataset.(Reference: Sharma A et al. (2023) Protein Sci 32(12): e4808).

DeepVF - explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. (Reference: Xie R et al (2021) Brief Bioinform. 22(3): bbaa125).

VirulentHunter - is a novel deep learning framework designed to address the limitations of existing VF identification methods. Traditional methods primarily rely on homology alignment, which can miss novel or divergent VFs and lack effective means for VF functional classification. VirulentHunter works directly from protein sequences, using deep learning models to achieve simultaneous VF identification and classification. (Reference: Chen C et al (2025) Brief Bioinformatics 26(3): bbaf271).

Effectidor - The Type III secretion system is an essential mechanism for host-pathogen interaction in the infection process. (Reference:Wagner, N. et al. 2022. Bioinformatics, 38(8): 2341–2343).

Bastion3 - is a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. (Reference: Wang J et al. Bioinformatics, 35(12): 2017–2028).

Specialized annotation - Genomic Islands:

Phage_Finder - was created to identify prophage regions in completed bacterial genomes. Using a test dataset of 42 bacterial genomes whose prophages have been manually identified, Phage_Finder found 91% of the regions, resulting in 7% false positive and 9% false negative prophages. A search of 302 complete bacterial genomes predicted 403 putative prophage regions, accounting for 2.7% of the total bacterial DNA. Analysis of the 285 putative attachment sites revealed tRNAs are targets for integration slightly more frequently (33%) than intergenic (31%) or intragenic (28%) regions, while tmRNAs were targeted in 8% of the regions. (Reference: D.E. Fouts. 2006. Nucleic Acids Res. 34: 5839–5851).

ProphET - ProphET identifies prophages in three steps: similarity search, calculation of the density of prophage genes, and edge refinement. ProphET performance was evaluated and compared with other phage predictors based on a set of 54 bacterial genomes containing 267 manually annotated prophages.This tool is part of TAMU Galaxy suite (Reference: João L. Reis-Cunha J.L. et al. 2019. PLOS ONE, 14 (10): e0223364).

PhageBoost - predicts prophages in a bacterial genomes and metagenomic contigs based on biological features without sequence similarities (Reference: Sirén K et al (2021) NAR Genom Bioinform 3(1): lqaa109).

PHASTESTPHAge Search Tool for Enhanced Sequence Translation - is designed to support the rapid identification, annotation and visualization of prophage sequences within bacterial genomes and plasmids. PHASTEST also supports extensive annotation and interactive visualization of all other genes (protein coding regions, tRNA sequences and rRNA sequences) in those same genomes. (Reference: Wishart D et al. Nucleic Acids Res. (2023) 51(W1): W443-W450.).

Prophage Hunter - provides a one-stop web service to extract prophage genomes from bacterial genomes, evaluate the activity of the prophages, identify phylogenetically related phages, and annotate the function of phage proteins. (Reference: Song W et al. (2019) Nucleic Acids Res 47(W1): W74–W80).

IslandViewer - integrates four different genomic island prediction methods: IslandPick, IslandPath-DIMOB, SIGI-HMM, and Islander (Reference: Bertelli et al. 2017. Nucleic Acids Res. 45(W1): W30–W35).

PAIDB (PAthogenicity Island DataBase) has made an effort to collect known PAIs and to detect the potential PAI regions in the prokaryotic complete genomes. Pathogenicity islands (PAIs) are distinct genetic elements of pathogens encoding various virulence factors. (Reference: Yoon SH et al. 2007. Nucleic Acids Res. 35 (Database Issue): D395-D400).

Genome comparisons and synteny:

SyntTax - is a web server linking synteny to prokaryotic taxonomy. SyntTax incorporates a full hierarchical taxonomic tree allowing intuitive access to all completely sequenced prokaryotes (Archaea and Bacteria). Single or multiple organisms can be chosen on the basis of their lineage by selecting the corresponding rank nodes in the tree. This is my favourite among the synteny programs (Reference: Oberto J. 2013. BMC Bioinformatics. 14:4). The results below were generated using the heat-shock sigma factor (RpoH) from Salmonella Typhimurium against the Pseudomonadales.

Cinteny Server for Synteny Identification and Analysis of Genome Rearrangement (A. U. Sinha & J. Meller, University of Cincinnati, USA) - this server can be used for finding regions syntenic across multiple genomes and measuring the extent of genome rearrangement using reversal distance as a measure. You may create a project and upload your own data or work with pre-loaded prokaryote or eukaryote data.

SimpleSynteny - provides a pipeline for evaluating the synteny of a preselected set of gene targets across multiple organismal genomes. An emphasis has been placed on ease-of-use, and users are only required to submit FASTA files for their genomes and genes of interest. SimpleSynteny then guides the user through an iterative process of exploring and customizing genomes individually before combining them into a final high-resolution figure. (Reference: Veltri D et al. 2016. Nucleic Acids Res. 44(Web Server issue): W41–W45).

Synteny Portal - eukaryotic genome users can easily (i) construct synteny blocks among multiple species by using prebuilt alignments in the UCSC genome browser database, (ii) visualize and download syntenic relationships as high-quality images, (iii) browse synteny blocks with genetic information and (iv) download the details of synteny blocks to be used as input for downstream synteny-based analyses, all in an intuitive and easy-to-use web-based interface. (Reference: Lee J et al. 2016. Nucleic Acids Res 44(W1): W35–W40).

Kablammo helps you create interactive visualizations of BLAST results from your web browser. Find your most interesting alignments, list detailed parametersfor each, and export a publication-ready vector image. Incredibly easy to use - here are the results for a BLASTN comparison to Escherichia phages T1 (query) and ADB-2. (Reference: Wintersinger JA et al. Bioinformatics 31:1305-1306).

M1CR0B1AL1Z3R - is a 'one-stop shop' for conducting microbial genomics data analyses via a simple graphical user interface. Some of the features implemented in M1CR0B1AL1Z3R are: (i) extracting putative open reading frames and comparative genomics analysis of gene content; (ii) extracting orthologous sets and analyzing their size distribution; (iii) analyzing gene presence-absence patterns; (iv) reconstructing a phylogenetic tree based on the extracted orthologous set; (v) inferring GC-content variation among lineages. M1CR0B1AL1Z3R facilitates the mining and analysis of dozens of bacterial genomes using advanced techniques. (Reference: Avram O et al. (2019) Nucleic Acids Res. 47(W1): W88-W92).

GeneOrder 4.0 (D. Seto, Bioinformatics & Computational Biology, George Mason Univ., U.S.A.) is designed to can be used to compare the gene order between two bacterial genomes (Reference: Mahadevan P. & Seto D. 2010. BMC Research Notes 3:41).
CoreGenes(D. Seto & P. Mahadevan, Bioinformatics & Computational Biology, George Mason Univ., U.S.A) - tallies the total number of genes in common between the two genomes being compared; displays the percent value of genes in common with a specific genome; determines the unique genes contained in a pair of proteomes. CoreGenes 3.5 is the batch CoreGenes server. I have extensively used this set of resources in the classification of bacterial viruses.

If you have a a gbk file for a phage which has not yet been deposited in GenBank you can use these instructions to convert your data into CoreGenes format for use here.

CoreGenes 5.0: A Webserver For The Determination Of Core Genes From Sets Of Viral And Bacterial Genomes (Padmanabhan Mahadevan, University of Tampa, FL, USA) - allows up to 20 GenBank accession numbers to be manually entered or using the "File Upload" feature >20 accession numbers can be assessed. The program will provide Bidirectional Best Hit, OrthoMCL or COGTriangle results. This program has proved very useful in recent studies on the classification of bacterial viruses. (Reference: Davis, P. et al. Viruses. 2022. 14(11): 2534).

CAGECAT - the online CompArative GEne Cluster Analysis Toolbox consists of claster and clinker which generate publication-quality gene cluster comparison figures. (Reference: Gilchrist CLM & Chooi Y-H. 2021. Bioinformatics 37(16): 2473-2475). On this website it is possible to choose protein comparison (clinker) or DNA comparison (cblaster). Below is a clinker diagram showing relatedness between a pair of phage proteomes. Clinker is also available here.

EDGAR (Efficient Database framework for comparative Genome Analyses using BLAST score Ratios) - EDGAR is designed to automatically perform genome comparisons in a high throughput approach and can be used for core genome, pan genome and singleton analysis, and Venn diagram construction. (Reference: Blom J. et al. 2009. BMC Bioinformatics 10: 154).

OrthoVenn3 - enables users to efficiently identify and annotate orthologous clusters and infer phylogenetic relationships across a range of species. The latest upgrade of OrthoVenn includes several important new features, including enhanced orthologous cluster identification accuracy, improved visualization capabilities for numerous sets of data, and wrapped phylogenetic analysis.(Reference: Sun J. et al. 2023. Nucl. Acids Res. 51(W1): W397-W403).

Phylogeny (AAI and ANI)

ANI (Average Nucleotide Identity) calculator - estimates the average nucleotide identity using both best hits (one-way ANI) and reciprocal best hits (two-way ANI) between two genomic datasets. Typically, the ANI values between genomes of the same species are above 95% (e.g., Escherichia coli). Values below 75% are not to be trusted, and AAI should be used instead. This tool supports both complete and draft genomes (multi-fasta). (Reference: Goris J et al. 2007. Int J Syst Evol Microbiol. 57(Pt 1): 81-91). Also see ANI calculator and FastANI which is part of the KBase suite of programs (requires registration).

Average Nucleotide Identity (ANI) calculator - their ANI Calculator uses the OrthoANIu algorithm, an improved iteration of the original OrthoANI algorithm, which uses USEARCH instead of BLAST (Reference: Yoon, S. H. et al. (2017). Antonie van Leeuwenhoek. 110:1281–1286).

VIRIDIC (Virus Intergenomic Distance Calculator; C. Moraru, Institute for Chemistry and Biology of the Marine Environment, Germany) - the first level of bacteriophage classification by ICTV involves computing the overall DNA sequence identity between two viruses. This new tool computes pairwise intergenomic distances/similarities amongst phage genomes. To run it, upload a single fasta file with all phage genomes of interest, create a project and press run. Save the project ID that will be displayed when the project is created. You will need it to access the data if the calculations take a long time (Reference: Moraru C et al. 2020. Viruses. 12(11): 1268).

GGDC (Genome-To-Genome Distance Calculator) - provides methods for inferring whole-genome distances which are well able to mimic DNA-DNA hybridization (DDH). Values calculated with GGDC yield a somewhat better correlation with wet-lab DDH values than alternative approaches such as "ANI". These distance functions can also cope with heavily reduced genomes and repetitive sequence regions. Some of them are also very robust against missing fractions of genomic information (due to incomplete genome sequencing). Thus, this web service can be used for genome-based species delineation. (Reference: Meier-Kolthoff JP et al. 2013. BMC Bioinformatics 14: 60).

POGO-DB - Based on computationally intensive whole-genome BLASTs, POGO-DB provides several metrics on pairwise genome: (a) Average Amino Acid Identity of all bi-directional best blast hits that covered at least 70% of the sequence and had 30% sequence identity; (b) Genomic Fluidity that estimates the similarity in gene content between two genomes; (c) Number of orthologs shared between two genomes (as defined by two criteria); (d) Pairwise identity of the most similar 16S rRNA genes; (e) Pairwise identity of 73 additional globally-conserved marker genes (which were determined by us to exist in at least 90% of all the genomes). (Reference: Lan Y et al. 2014. Nucl. Acids Res. 42(D1): D625-D632).

VICTOR (Virus Classification and Tree Building Online Resource; Leibniz-Institut DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH). This web service compares bacterial and archaeal viruses ("phages") using their genome or proteome sequences. The results include phylogenomic trees inferred using the Genome-BLAST Distance Phylogeny method (GBDP), with branch support, as well as suggestions for the classification at the species, genus and family level. (The service can be applied to other kinds of viruses, too, but has not yet been tested in this respect.) Upload your FASTA files, GenBank files and/or GenBank accession IDs. (Reference: JP Meier-Kolthoff & M Göker. 2017. Bioinformatics 33(21): 3396–3404).

VIRFAM is dedicated to the recognition of head-neck-tail modules and of recombinase genes in phage genomes. You can use this server to search for remote homologs of specific protein families within protein sequences of bacteriophages. Input: protein sequences you’re your phage; output includesd a phylogenetic tree with the placement of your virus. (Reference: Lopes A et al. Nucleic Acids Res. (2010) 38(12): 3952-62).

VipTree - generates a "proteomic tree" of viral genome sequences based on genome-wide sequence similarities computed by tBLASTx. The original proteomic tree concept (i.e., "the Phage Proteomic Tree”) was developed by Rohwer and Edwards, 2002. A proteomic tree is a dendrogram that reveals global genomic similarity relationships between tens, hundreds, and thousands of viruses. It has been shown that viral groups identified in a proteomic tree well correspond to established viral taxonomies. (Reference: Nishimura Y et al. (2017) Bioinformatics 33: 2379–2380).

MiGA (Microbial Genomes Atlas) - a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. (Reference: Rodriguez-R et al (2018) Nucleic Acids Research 46(W1): W282-W288).

Genome visualization:

Proksee(Paul Stothard, Univ. Alberta, Canada) - is an updated version of my go-to program for analysis and visualization of bacterial and phage genomes - CGView Server. This version includes integrated genome annotation tools and a new CGView engine written in JavaScript that allows for rapid zooming to the DNA sequence level. Extensive options are available for customizing maps and highlighting features of interest. (Instructions)

PlasMapper 3.0 - allows users to generate, edit, annotate and interactively visualize publication quality plasmid maps. Additionally, it offers an option of automated codon optimization and BLAST sequence alignment. (Reference: Wishart DS et al. 2023. Nucleic Acids Res 51(W1): W459-W467).

Jena Prokaryotic Genome Viewer (JPGV) - from a GenBank flatfile (*.gbk) generates linear or circular plots; including if desired GC content, GC skew, purine excess and keto excess can be displayed. Also allows BLAST analysis against related genomes. Requires free registration.

GenomeVx - makes editable, publication-quality, maps of mitochondrial and chloroplast genomes and of large plasmids. These maps show the location of genes and chromosomal features as well as a position scale. The program takes as input either raw feature positions or GenBank records. In the latter case, features are automatically extracted and colored. Output is in the Adobe Portable Document Format (PDF) and can be edited by programs such as Adobe Illustrator.(Reference: G. Conant & K. Woolfe. 2008. Bioinformatics 24:861-862).

myGenomeBrowser - is a web-based environment that provides biologists with a way to build, query and share their genome browsers. This tool, that builds on JBrowse, is designed to give users more autonomy while simplifying and minimizing intervention from system administrators. They have extended genome browser basic features to allow users to query, analyze and share their data. (Reference: S. Carrere & J. Gouzy. Bioinformatics (2017) 33 (8): 1255-1257).

OrganellarGenomeDRAW - is a suite of software tools that enable users to create high-quality visualrepresentations of both circular and linear annotated genome sequences provided as GenBank files oraccession numbers. Although all types of DNA sequences are accepted as input, the software has beenspecifically optimized to properly depict features of organellar genomes. A recent extension facilitates theplotting of quantitative gene expression data, such as transcript or protein abundance data, directly ontothe genome map (Reference:Lohse M, et al. 2013. Nucleic Acids Res. 41(Web Server issue):W575-81).

PlasmaDNA - Starting with a primary DNA sequence, PlasmaDNA looks for restriction sites, open reading frames, primer annealing sequences, and various common domains. The databases are easily expandable by the user to fit his most common cloning needs. PlasmaDNA can manage and graphically represent multiple sequences at the same time, and keeps in memory the overhangs at the end of the sequences if any. This means that it is possible to virtually digest fragments, to add the digestion products to the project, and to ligate together fragments with compatible ends to generate the new sequences. Excellent package for plasmids. (Reference: Angers-Loustau A et al. 2007. BMC Mol Biol. 2007; 8:77).

GECA - is a user-friendly tool for representing gene exon/intron organization and highlighting changes in gene structure among members of a gene family. It relies on protein alignment, completed with the identification of common introns in the corresponding genes using CIWOG. GECA produces a main graphical representation showing the resulting aligned set of gene structures, where exons are to scale. The important and original feature of GECA is that it combines these gene structures with a symbolic display highlighting sequence similarity between subsequent genes. It is worth noting that this combination of gene structure with the indications of similarities between related genes allows rapid identification of possible events of gain or loss of introns, or points to erroneous structural annotations. The output image is generated in a portable network graphics format which can be used for scientific publications. (Reference: Fawal N, et al. 2012. Bioinformatics; 28:1398-9).

Metagenomics:

MG-RAST (the Metagenomics RAST) server is an automated analysis platform for metagenomes providing quantitative insights into microbial populations based on sequence data. The server primarily provides upload, quality control, automated annotation and analysis for prokaryotic metagenomic shotgun samples. (Reference: Wilke A, et al. 2016. Nucleic Acids Res. 44(D1):D590-4).

AmphoraNet2 - uses 31 bacterial and 104 archaeal protein coding marker genes for metagenomic and genomic phylotyping. Most of these are single copy genes, therefore AmphoraNet is suitable for estimating the taxonomic composition of bacterial and archaeal communities from metagenomic shotgun sequencing data. (Reference: Kerepesi C, et al. 2014. Gene. 533:538-40).

EBI Metagenomics(EMBL-EBI) - is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. You can freely browse all the public data in the repository. The service identifies rRNA sequences, using rRNASelector, and performs taxonomic analysis upon 16S rRNAs using Qiime. The remaining reads are submitted for functional analysis of predicted protein coding sequences using the InterPro sequence analysis resource. InterPro uses diagnostic models to classify sequences into families and to predict the presence of functionally important domains and sites. By utilising this resource, the service offers a powerful and sophisticated alternative to BLAST-based functional metagenomic analyses. Data submitted to the EBI Metagenomics service is automatically archived in the European Nucleotide Archive (ENA). Accession numbers are supplied for sequence data.

Meta sites:

DNAATLAS(DNA2.0 Inc., U.S.A.) - A place for all your sequences. Easily import all your constructs including Genbank, Gene Designer, Excel, Word, and nearly any text-based format. DNA Atlas immediately parses your upload files and infers whether each sequence is a feature, construct, primer, DNA or amino acid. Upload features and primers to see them annotated in your sequences. Instantly view constructs annotated with our curated list of over 1000 features, or add your own. Use the BLAST-based sequence search to quickly align and compare your sequences.Keep track of your sequences, features, and primers. Categorize them using tags - from freezer locations to characterization data. (requires registration).

Naming your bacteriophage: This is of prime importance for members of the bacterial virus community to name their newly isolated phages appropriately. A good place to start is "How to Name and Classify Your Phage: An Informal Guide." (Reference: Adriaenssens E & Brister JR. 2017. Viruses 9(4). pii: E70) to which I will add the following points (a) please check that the name you propose has not been used already; and, (b) Do not name your phage Enterobacteria phage ø1234 or Enterobacteria phage 2017/ABC_567 since these names are incompatable with the creation of new species and genera taxa by the International Committee on Taxonomy of Viruses (ICTV). To find if your proposed name is unique consult:

Phage Name Check (Stephen T. Abedon, Ohio State University, USA) - to see whether 'your' phage name is currently found on Google Scholar, Google Books, PubMed, or even Bacteriophage Names 2000.

CPT Phage Name Search(Center for Phage Technology at Texas A&M University)

Other useful phage resources:

PhageScope - applying fifteen state-of-the-art tools to perform systematic annotations and analyses, PhageScope provides annotations on genome completeness, host range, lifestyle information, taxonomy classification, nine types of structural and functional genetic elements, and three types of comparative genomic studies for curated phages. Additionally, PhageScope incorporates automatic analyses and visualizations for curated and customized phages, serving as an efficient platform for phage study (Reference: Wanng RH et al (2024) Nucleic Acids Research, 52(D1): D756–D761).

VIRALpro - is an effective tool for identifying capsid and tail protein sequences, which are the cornerstones toward viral sequence annotation and viral genome classification. It is part of the SCRATCH Protein Predictor as useful protein analysis meta site (Reference: Galiez C et al. (2016) Bioinformatics, 32(9): 1405–1407).

taxMyPhage - is a system for the rapid automated classification of dsDNA bacteriophage genomes. The system integrates a MASH database, built from ICTV-classified phage genomes to identify closely related phages, followed by BLASTn to calculate intergenomic similarity, conforming to ICTV guidelines for genus and species classification. The system also detected inconsistencies in current ICTV classifications, identifying cases where genera did not adhere to ICTV's 70% average nucleotide identity (ANI) threshold for genus classification or 95% ANI for species. (Reference: Millard A et al. (2025) Phage (New Rochelle). 6(1): 5-11).

PhaBOX - can comprehensively identify and analyze phage contigs in metagenomic data. It supports integrated phage analysis, including phage contig identification from the metagenomic assembly, lifestyle prediction, taxonomic classification, and host prediction. Instead of treating the algorithms as a black box, PhaBOX also supports visualization of the essential features for making predictions. The web server is designed with a user-friendly graphical interface that enables both informatics-trained and nonspecialist users to analyze phages in microbiome data with ease. Click on Pipeline to get started (Reference: Shang J et al. (2023) Bioinform Adv 3(1): vbad101).

PhageGE (Phage Genome Explorer) - is a user-friendly graphical interface application for the interactive analysis of phage genomes. PhageGE enables users to perform key analyses, including phylogenetic analysis, visualization of phylogenetic trees, prediction of phage life cycle, and comparative analysis of phage genome annotations. (Reference: Zhao J et al (2024) Gigascience 13: giae074).

PhageAI - allows to access more than 10 000 publicly available bacteriophages and differentiate between their major types of life cycles: lytic and lysogenic. The tool included life cycle classifier which achieved 98.90% accuracy on a validation set and 97.18% average accuracy on a test set. (Reference: PiotrTynecki, ArkadiuszGuzinski, JoannaKazimierczak, MichalJadczuk, JaroslawDastych, AgnieszkaOniskodoi: https://doi.org/10.1101/2020.07.11.198606 ). Requires free registration.

VAPEX (Virus And Phage EXplorer) - is an interactive web server for the deep exploration of natural virus and phage genomes his tool enables users to easily perform various genomic analysis queries on all natural viruses and phages that have been fully sequenced and are listed in the NCBI compendium. VAPEX therefore excels in producing visual depictions of fully resolved synteny maps, which is one of its key strengths. VAPEX has the ability to exhibit a vast array of orthologous gene classes simultaneously through the use of symbolic representation. Additionally, VAPEX can fully analyze user-submitted viral and phage genomes, including those that have not yet been annotated. (Reference: Hepp B et al. (2023) Bioinformatics 39(8): btad528).

PhaGAA - is an integrated web server platform for phage genome annotation and analysis. By incorporating several annotation tools, PhaGAA is constructed to annotate phage genome at DNA- and protein-levels and provide the analytical results. DNA-based annotation includes host prediction, closest phage search, lifestyle recognition, promoter and spanin gene identification. Protein-based annotation composes of virion protein identification, protein domain, structural and functional proteins classification. (Reference: Wu J et al. 2023. Bioinformatics 39(3): btad120).

VIRFAM - allows automated classification of tailed bacteriophages according to their neck organization. This webserver automatically identifies proteins of the phage head-neck-tail module and assign phages to the most closely related cluster of phages. (Reference: Lopez A (2014) BMC Genomics 15: 1027)

PHISDetector - Phage-microbe interactions leave diverse signals in bacterial and phage genomic sequences, defined as phage-host interaction signals (PHISs), which include clustered regularly interspaced short palindromic repeats (CRISPR) targeting, prophage, and protein-protein interaction signals. In the present study, we developed a novel tool phage-host interaction signal detector (PHISDetector) to predict phage-host interactions by detecting and integrating diverse in silico PHISs, and scoring the probability of phage-host interactions using machine learning models based on PHIS features. (Reference: Zhou F et al (2022) Genomics Proteomics Bioinformatics. 20(3): 508-523).

PhageTailFinder - is a novel software suitable for high-throughput phage tail region detection. It required phage genomic sequences in GenBank or FASTA format as input.

UPDATED: August, 2025