Specialized Annotation - Virulence Determinants

VirulenceFinder 2.0

VirulenceFinder 2.0 (Danish Technical University) – identification of virulence genes. The method uses BLAST for identification of known virulence genes in Escherichia coli. The method is being extended to also include virulence genes for Enterococcus and Staphylococcus aureus. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms.


t3db

t3db the Toxin and Toxin Target Database - combines detailed toxin data with comprehensive toxin target information. The database currently houses 3,053 toxins which are linked to 1,670 corresponding toxin target records. Each toxin record (ToxCard) contains over 50 data fields and holds information such as chemical properties and descriptors, toxicity values, molecular and cellular interactions, and medical information.
(Reference: Lim E et al. 2010. Nucleic Acids Res. 38(Database issue): D781-786).


VFDB

VFDB - is an integrated and comprehensive database of virulence factors for bacterial pathogens (also including Chlamydia and Mycoplasma).
(Reference: L.H. Chen et al. 2012. Nucleic Acids Res. 40(Database issue): D641-D645).


PAIDB

PAIDB (Pathogenicity Island Database) - Pathogenicity islands (PAIs) and resistance islands (REIs) are key to the evolution of pathogens and appear to play complimentary roles in the process of bacterial infection. While PAIs promote disease development, REIs give a fitness advantage to the host against multiple antimicrobial agents. An anncillary program, PAI Finder, identifies PAI-like regions or REI-like regions in a multi-sequence query.
(Reference: S.H Yoon et al. 2015. Nucl. Acids Res. 43 (D1): D624-D630).


IslandViewer

IslandViewer - includes a new interactive genome visualization tool, IslandPlot, and expanded virulence factor, antimicrobial resistance gene, and pathogen-associated gene annotations, as well as homologs of these genes in closely related genomes. Notably, incomplete genomes are accepted as input in IslandViewer 4, though they strongly urge users to use complete genomes whenever possible.
(Reference: B.K. Dhillon et al. 2015. Nucl. Acids Res. 43 (W1): W104-W108).


Gypsy Database

Gypsy Database - an open editable database about the evolutionary relationship of viruses, mobile genetic elements (MGEs; Ty3/Gypsy, Retroviridae, Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae pararetroviruses of plants) and other genomic repeats. Equipped for BLAST and HMM searches.
(Reference: Llorens, C et al. 2011. Nucl. Acids Res. 39(suppl 1): D70-D74).


PathogenFinder 1.1

PathogenFinder 1.1 (Danish Technical University)– Based on complete genomes from 513 bacteria annotated as human non-pathogens and 372 bacteria annotated as human pathogens, a database of protein families, which are either mainly associated with non-pathogens or with pathogens have been created. This database is then used for predicting the pathogenic potential of bacteria. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms.
(Reference: Cosentino S et al. 2013. PLoS ONE 8: e77302)


TASmania

TASmania - is bacterial Toxin-Antitoxin Systems database has mined over 41K assemblies of the EnsemblBacteria database for known and uncharacterized protein components of type I to IV TAS loci.
(Reference: Akarsu H et al. (2024) PLoS Comput Biol 15(4): e1006946).


VirulentPred

VirulentPred - is a SVM based method to predict bacterial virulent proteins sequences, which can be used to screen virulent proteins in proteomes. Together with experimentally verified virulent proteins, several putative, non-annotated and hypothetical protein sequences have been predicted to be high scoring virulent proteins by the prediction method. Version 2 has achieved 84.71% accuracy with the validation dataset and 85.18% on an independent test dataset.
(Reference: Sharma A et al. (2023) Protein Sci 32(12): e4808).


DeepVF

DeepVF - explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test.
(Reference: Xie R et al (2021) Brief Bioinform. 22(3): bbaa125).


VirulentHunter

VirulentHunter - is a novel deep learning framework designed to address the limitations of existing VF identification methods. Traditional methods primarily rely on homology alignment, which can miss novel or divergent VFs and lack effective means for VF functional classification. VirulentHunter works directly from protein sequences, using deep learning models to achieve simultaneous VF identification and classification.
(Reference: Chen C et al (2025) Brief Bioinformatics 26(3): bbaf271).


Effectidor

Effectidor - The Type III secretion system is an essential mechanism for host-pathogen interaction in the infection process.
(Reference: Wagner, N. et al. 2022. Bioinformatics, 38(8): 2341–2343).


Bastion3

Bastion3 - is a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning.
(Reference: Wang J et al. Bioinformatics, 35(12): 2017–2028).

Updated: December, 2025