Genome Annotation
DFAST
DFAST
- is a very quick prokaryotic genome annotation pipeline providing rich
information on pseudogenes, translation exceptions and orthologous gene
assignment between given reference genomes. DFAST also supports genome
submission to public sequence databases
(Reference: Tanizawa Y et al. (2018) Bioinformatics.
34(6): 1037-1039).
One of my favourite annotation pipelines due to its speed and
simplicity.
Bakta web server
Bakta web server
- is a user-friendly web interface for conducting and visualizing
annotations using Bakta without requiring command line expertise or local
computing resources. Key features include interactive visualizations
through circular genome plots, linear genome browsers, and searchable
data tables facilitating the interpretation of complex annotation results.
The web server generates standard bioinformatics outputs (GFF3, GenBank,
EMBL) and annotates diverse genomic features, including coding sequences,
non-coding RNAs, small open reading frames (sORFs)
(Reference: Beyvers S et al. (2025) Nucleic Acids
Research53(W1): W51–W56).
Also available at
Galaxy.eu.
Requires registration.
pharokka
pharokka
- provides annotations in a fast, scalable and consistent fashion.
Pharokka identifies predicted coding sequences (CDS), transfer RNAs
(tRNAs), transfer-messenger RNAs (tmRNAs) and clustered regularly
interspaced short palindromic repeats (CRISPRs), providing functional
annotation for CDS using the PHROGs database
(Reference: Bouras G et al. (2023) Bioinformatics,
39(1): btac776).
Also available at
GoogleColab.
Requires registration.
Proksee
Proksee
- provides users with a powerful, easy-to-use, and feature-rich system
for assembling, annotating, analysing, and visualizing bacterial genomes.
Proksee accepts Illumina sequence reads as compressed FASTQ files or
pre-assembled contigs in raw, FASTA, or GenBank format. Alternatively,
users can supply a GenBank accession or a previously generated Proksee
map in JSON format. Proksee then performs assembly (for raw sequence
data), generates a graphical map, and provides an interface for
customizing the map and launching further analysis jobs. Notable features
of Proksee include unique and informative assembly metrics provided via a
custom reference database of assemblies; a deeply integrated
high-performance genome browser for viewing and comparing analysis
results at individual base resolution (developed specifically for
Proksee); an ever-growing list of embedded analysis tools whose results
can be seamlessly added to the map or searched and explored in other
formats; and the option to export graphical maps, analysis results, and
log files for data sharing and research reproducibility
(Reference: Grant JR et al (2023) Nucleic Acids Res.
51(W1): W484-W492.)
RAST
RAST
(Rapid Annotation using Subsystem Technology) is a fully-automated
service for annotating bacterial and archaeal genomes. It provides high
quality genome annotations for these genomes across the whole
phylogenetic tree. Requires registration.
(Reference: Aziz, RK et al. 2008. BMC Genomics 9:75.).
BV-BRC
BV-BRC
(Bacterial and Viral Bioinformatics Resource Center) - is a comprehensive resource supporting research on bacterial
and viral pathogens. It currently hosts over 14 million publicly available genomes and 33 high-throughput bioinformatic
analysis services with numerous visual analytic tools allowing researchers to analyze their private data, generate
comparisons with public data, and share data and results with colleagues.
(Reference: Shukla M et al. 2025. Nucl. Acids Res. gkaf1254).
BASys2
BASys2
(Bacterial Annotation System 2.0) - this powerful web server for comprehensive bacterial genome annotation accepts
either FASTA or FASTQ files . It identifies all gene types (protein-coding, tRNA, rRNA, etc.) and generates up to 62
annotation fields per gene using over 30 tools and 10 databases. The interactive genome viewer provides detailed,
multi-resolution visualizations and clickable gene cards, while also supporting metabolome annotations and 3D protein
structure visualizations. Annotations include structural, functional, and statistical data, with results available for
download in JSON and GenBank formats. BASys2 delivers fast, extensive, and high-quality genome annotations that rival
or exceed those in databases like UniProt.
Reference: Poelzer J et al. 2025. Nucleic Acids Research 53(W1): W57 - W67.
MicroScope
MicroScope
- (CEA, Institut de Génomique - Genoscope, France) is a microbial genome
annotation & analysis platform which provides access to a wide range of
tools including COG analysis, comparative genomics ...
(Reference: Vallenet D et al. (2017) Nucleic Acids
Res. 45(D1): D517-D528).
Requires registration.
MAKER Web Annotation Service
MAKER Web Annotation Service
(MWAS) is an easily configurable web-accesible genome annotation pipeline.
It's purpose is to allow research groups with small to intermediate
amounts of eukaryotic and prokaryotic genome sequence (i.e. BAC clones,
small whole genomes, preliminary sequencing data, etc.) to independently
annotate and analyse their data and produce output that can be loaded
into a genome database.
(Reference: Holt, C. & Yandell, M. 2011. BMC
Bioinformatics 12:491).
MITOS2
MITOS2
(part of Galaxy,org) - is a pipeline designed to provide consistent and
high quality de novo annotation of Metazoan mitochondrial genomes
sequences. We show that the results of MITOS match RefSeq and MitoZoa in
terms of annotation coverage and quality. At the same time we avoid
biases, inconsistencies of nomenclature, and typos originating from
manual curation strategies.
(Reference: M. Bernt et al. 2013. Molecular
Phylogenetics & Evolution 69:313-319).
GenSAS
GenSAS - Genome Sequence Annotation Server - provides a one-stop website with a single graphical interface for running multiple structural and functional annotation tools, enabling visualization and manual curation of genome sequences. Users can upload sequences into their account and run gene prediction programs, protein homology searches, map ESTs, identify repeats, ORFs and SSRs with custom parameter settings. Each analysis is displayed on separate tracks of the graphical interface with custom editabe tracks to select final annotation of features and create gff3 files for upload to genome browsers such as GBrowse. Additional programs can be easily added using this Drupal based software.
FLAN
FLAN
(FLu ANnotation) is an NCBI web server for genome annotation of influenza
virus is a tool for user-provided influenza A virus or influenza B virus
sequences. It can validate and predict protein sequences encoded by an
input flu sequence.
(Reference: Y. Bao et al. 2007. Nucleic Acids Res.
(Web Server issue) 35: W280-W284).
GATU
Genome Annotation Transfer Utility
(GATU)
annotates a genome based on a very closely related reference genome. The
proteins/mature peptides of the reference genome are BLASTed against the
genome to be annotated in order to find the genes/mature peptides in the
genome to be annotated
(Reference: T. Tcherepanov et al. 2006. BMC Genomics
7:150.)
Companion
Companion -
allow non-experts to annotate their arthropod, fungal or protozoan genomes using a reference-based method, enabling them to
assess the output before submitting to public databases.
(Reference: Haese-Hill W et al. 2024. Nucleic Acids Research 52(W1): W39 - W44).
BioGPS
BioGPS (The Scripps Research Institute, USA) - is a one-stop gene annotation portal that emphasizes user-customizability and community-extensibility It is a customizable gene annotation portal and a complete resource for learning about gene and protein function.
MOSGA
MOSGA -
(Modular Open-Source Genome Annotator) - is a genome annotation framework for eukaryotic genomes with a user-friendly
web-interface that generates and integrates annotations from various tools. The aggregated results can be analyzed with a
fully integrated genome browser and are provided in a format ready for submission to NCBI
(Reference: Martin R et al. 2020. Bioinformatics 36: 5514-5515).
BAGEL
BAGEL (Groningen Biomolecular Sciences and Biotechnology Institute, Haren, the Netherlands) - will determine from an existing or non submitted GenBank file the presence of bacteriocins based on a database containing information of known bacteriocins and adjacent genes involved in bacteriocin activity. See LABioicin if you are interested in the topic of Lactic Acid Bacteria (LAB) and its bacteriocins.
MG-RAST
MG-RAST
(Metagenome Rapid Annotation using Subsystem Technology) is a
fully-automated service for annotating metagenome samples. It provides
annotation of sequence fragments, their phylogenetic classification and
an initial metabolic reconstruction. The service also provides means for
comparing phylogenetic classifications and metabolic reconstructions of
metagenomes
(Reference: F. Meyer et al. 2008. BMC Bioinformatics
9: 386).
Mobile Genetic Elements - Not prophage
MOBHunter
MOBHunter
- mobile genetic elements (MGEs) range from small transposons to conspicuous integrative and conjugative elements. These
regions often confer advantageous traits, including antibiotic resistance or novel metabolic capabilities, and contain
foreign sequence signatures and hallmark genes such as transposases, integrases, etc. While bioinformatic tools target
specific MGE subsets using alignments, compositional signatures, or diagnostic gene mapping, no single platform offers a
unified framework for comprehensive, evidence-based, MGE identification and classification. MOBHunter is an advanced
bioinformatic pipeline that consolidates standalone tools and in-house algorithms.
(Reference: Rojas-Villalobos C et al. 2025. Nucleic Acids Research 53(W1): W398 - W407).
Chromosome replication origin:
Ori-Finder
Ori-Finder
- is a useful platform for the identification and analysis of replication
origins (oriCs) in the bacterial genomes.
(Reference: Luo H et al. (2019) Brief Bioinform
20(4): 1114-1124).
OriV-Finder
OriV-Finder
- is a comprehensive web server for bacterial plasmid replication origin analysis. It uses replication initiation proteins
(RIPs) and sequence data to identify replication origins in plasmids.
(Reference: Li Y & Gao F et al. 2025. Nucleic Acids Research 53(W1): W451 - W456).
DoriC
Please note that these tools have been used to create
DoriC
- a database of replication origins in prokaryotic genomes including
chromosomes and plasmids.
(Reference: Luo H & Gao F (2019) Nucleic Acids Res.
47(D1): D74-D77).
Updated: February, 2026