Specialized Annotation - Virulence Determinants
VirulenceFinder 2.0
VirulenceFinder 2.0 (Danish Technical University) – identification of virulence genes. The method uses BLAST for identification of known virulence genes in Escherichia coli. The method is being extended to also include virulence genes for Enterococcus and Staphylococcus aureus. As input, the method can use both pre-assembled, complete or partial genomes, and short sequence reads from four different sequencing platforms.
t3db
t3db
the Toxin and Toxin Target Database - combines detailed toxin data with
comprehensive toxin target information. The database currently houses
3,053 toxins which are linked to 1,670 corresponding toxin target
records. Each toxin record (ToxCard) contains over 50 data fields and
holds information such as chemical properties and descriptors, toxicity
values, molecular and cellular interactions, and medical information.
(Reference: Lim E et al. 2010. Nucleic Acids Res.
38(Database issue): D781-786).
VFDB
VFDB
- is an integrated and comprehensive database of virulence factors for
bacterial pathogens (also including Chlamydia and Mycoplasma).
(Reference: L.H. Chen et al. 2012. Nucleic Acids Res.
40(Database issue): D641-D645).
PAIDB
PAIDB
(Pathogenicity Island Database) - Pathogenicity islands (PAIs) and
resistance islands (REIs) are key to the evolution of pathogens and
appear to play complimentary roles in the process of bacterial infection.
While PAIs promote disease development, REIs give a fitness advantage to
the host against multiple antimicrobial agents. An anncillary program,
PAI Finder, identifies PAI-like regions or REI-like regions in a
multi-sequence query.
(Reference: S.H Yoon et al. 2015. Nucl. Acids Res.
43 (D1): D624-D630).
IslandViewer
IslandViewer
- includes a new interactive genome visualization tool, IslandPlot, and
expanded virulence factor, antimicrobial resistance gene, and
pathogen-associated gene annotations, as well as homologs of these genes
in closely related genomes. Notably, incomplete genomes are accepted as
input in IslandViewer 4, though they strongly urge users to use complete
genomes whenever possible.
(Reference: B.K. Dhillon et al. 2015. Nucl. Acids
Res. 43 (W1): W104-W108).
Gypsy Database
Gypsy Database
- an open editable database about the evolutionary relationship of
viruses, mobile genetic elements (MGEs; Ty3/Gypsy, Retroviridae,
Ty1/Copia and Bel/Pao LTR retroelements and the Caulimoviridae
pararetroviruses of plants) and other genomic repeats. Equipped for BLAST
and HMM searches.
(Reference: Llorens, C et al. 2011. Nucl. Acids Res.
39(suppl 1): D70-D74).
PathogenFinder 1.1
PathogenFinder 1.1
(Danish Technical University)– Based on complete genomes from 513 bacteria
annotated as human non-pathogens and 372 bacteria annotated as human
pathogens, a database of protein families, which are either mainly
associated with non-pathogens or with pathogens have been created. This
database is then used for predicting the pathogenic potential of bacteria.
As input, the method can use both pre-assembled, complete or partial
genomes, and short sequence reads from four different sequencing
platforms.
(Reference: Cosentino S et al. 2013. PLoS ONE 8:
e77302)
TASmania
TASmania
- is bacterial Toxin-Antitoxin Systems database has mined over 41K
assemblies of the EnsemblBacteria database for known and uncharacterized
protein components of type I to IV TAS loci.
(Reference: Akarsu H et al. (2024) PLoS Comput Biol
15(4): e1006946).
VirulentPred
VirulentPred
- is a SVM based method to predict bacterial virulent proteins sequences,
which can be used to screen virulent proteins in proteomes. Together with
experimentally verified virulent proteins, several putative, non-annotated
and hypothetical protein sequences have been predicted to be high scoring
virulent proteins by the prediction method. Version 2 has achieved 84.71%
accuracy with the validation dataset and 85.18% on an independent test
dataset.
(Reference: Sharma A et al. (2023) Protein Sci 32(12):
e4808).
DeepVF
DeepVF
- explores a wide range of heterogeneous features with popular machine
learning algorithms. Specifically, four classical algorithms, including
random forest, support vector machines, extreme gradient boosting and
multilayer perceptron, and three DL algorithms, including convolutional
neural networks, long short-term memory networks and deep neural networks
are employed to train 62 baseline models using these features. In order
to integrate their individual strengths, DeepVF effectively combines these
baseline models to construct the final meta model using the stacking
strategy. Extensive benchmarking experiments demonstrate the effectiveness
of DeepVF: it achieves a more accurate and stable performance compared
with baseline models on the benchmark dataset and clearly outperforms
state-of-the-art VF predictors on the independent test.
(Reference: Xie R et al (2021) Brief Bioinform.
22(3): bbaa125).
VirulentHunter
VirulentHunter
- is a novel deep learning framework designed to address the limitations
of existing VF identification methods. Traditional methods primarily rely
on homology alignment, which can miss novel or divergent VFs and lack
effective means for VF functional classification. VirulentHunter works
directly from protein sequences, using deep learning models to achieve
simultaneous VF identification and classification.
(Reference: Chen C et al (2025) Brief Bioinformatics
26(3): bbaf271).
Effectidor
Effectidor
- The Type III secretion system is an essential mechanism for
host-pathogen interaction in the infection process.
(Reference: Wagner, N. et al. 2022. Bioinformatics,
38(8): 2341–2343).
Bastion3
Bastion3
- is a two-layer ensemble predictor developed to accurately identify type
III secreted effectors from protein sequence data. In contrast with
existing methods that employ single models with few features, Bastion3
explores a wide range of features, from various types, trains single
models based on these features and finally integrates these models through
ensemble learning.
(Reference: Wang J et al. Bioinformatics, 35(12):
2017–2028).
Updated: December, 2025