CONVERT

Several sites are available for conversion of sequence from one format to another. These include:

Genome2D Genome Tools (Dr. Anne de Jong, Molecular Genetics, University of Groningen, The Netherlands) - this is my go-to site for all manner of analyses. Under "Genome Tools" select "Conversions." This will allow you to convert a GenBank flatfile (gbk) to GFF (General Feature Format, table), CDS (coding sequences), Proteins (FASTA Amino Acids, faa), DNA sequence (Fasta format).

Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research. This web server makes analysis tools, genomic data, tutorial demonstrations, persistent workspaces, and publication services available to any scientist. Extensive user documentation applicable to any public or local Galaxy instance is available. Offers a huge varierty of tools for analysis and file interconversion.

Sequence Manipulation Suite (SMS) - this program allows you to remove digits and blank spaces from a sequence to make it suitable for other applications. Also found here.

Sequence conversion (Bioinf @ Bugaco) - a huge suite of conversion tools.

Readseq developed by D.G. Gilbert (Indiana University) reads and converts biosequences between a selection of common biological sequence formats, including EMBL, GenBank and fasta sequence formats is available here.

EMBOSS Seqret reads and writes (returns) sequences. It is useful for a variety of tasks, including extracting sequences from databases, displaying sequences, reformatting sequences, producing the reverse complement of a sequence, extracting fragments of a sequence, sequence case conversion or any combination of the above functions.

Format Converter - This program takes as input a sequence or sequences (e.g., an alignment) in an unspecified format and converts the sequence(s) to a different user-specified format. Also converts *.gbk to *.gff3.

ApolloRNA Convert data - Transformation of TransTermHP, CRISPRfinder, MOSAIC, PatScan, DARN! (GFF), GenBank output data in GFF and GAME XML format data that can be read by Apollo.

GenBank Trans Extractor accepts a GenBank file as input and returns each of the protein translations described in the file in FASTA format. GenBank Trans Extractor should be used when you are more interested in the predicted protein translations of a DNA sequence than the DNA sequence itself. Part of the Sequence Manipulation Suite.

FeatureExtract 1.2 - extracts sequence and feature annotation, such as intron/exon structure, from GenBank entries and other GenBank format files. (Reference: R. Wernersson (2005) Nucleic Acids Res. 33(Web Server issue): W567–W569).

Sequence editor (part of Shiladitya DasSarma's HaloWeb: The Haloarchaeal Genomes Database) - converts DNA and RNA sequences. Generate antiparallel, complement and inverse sequences.

Format conversion - (single sequence, set of sequences, alignment, tree, matrix, ...) and format are automatically recognized. Output: FASTA, NEXUS, PHYLIP, Clustal, EMBL, Newick, New Hampshire).

Fasta dataset splitter - Part of FaBox (see below)

GenBank 2 Sequin (P. Lehwark & S. Greiner, Max-Planck Institute for Molecular Plant Physiology, Germany) - this extremely usesful program is designed to convert revised GeSeq output into the Sequin format which used to be required for NCBI submission. It also generates formats which can be used for small genome submissions. None the less, any custom GenBank file can be prepared for NCBI submission using GenBank 2 Sequin.

JaMBW(European Molecular Biology Laboratory of Heidelberg, Germany). Java based Molecular Biologist's Workbench.Select Chapter 1 for sequence format conversion (upper lower case; T U; reverse or complement sequence).

Nucleic Acid Sequence Massager (Allotron Biosensor Corporation) which in addition to removing spurious material (numbers, breaks, HTML, spaces) changes the format (upper to low case, complement, reverse, RNA to DNA, and triplets).

extractUpStreamDNA (A. Villegas, Public Health Ontario) - takes a Genbank flatfile (*.gbk) as input and parses through and for every CDS that it finds, it extracts a pre-determined length of DNA upstream (length will be an argument; and will include 3 nt for the initiation codon). Output will be an FFN file of these upstream DNA sequences. N.B. this only WORKS for prokaryotic sequences because it does not handle Splits or Joins found in eukaryotic. This data then can be analyzed with programs such as MEME.This program is temporarily unavailable online, though one can download it from here.

Convert GenBank to Fasta (G. Rocap, School of Oceanography, University of Washington, U.S.A.) - Select a GenBank formatted file containing a feature table. Select whether to extract translated peptide sequences, DNA sequence for each feature, or the entire DNA sequenceof the whole record. If you chose "Peptide Sequence", your feature table must have "translation" sub-features.

GenBank-JSON-Conversion - this converter accepts multi-sequence GenBank, DDBJ, or EMBL/ENA files (*.gb, *.gbk, *.gbf, *.genbank, *.gbff, *.ena, *.embl or *.txt). Alternatively, enter NCBI accession number(s) or upload them in a *.csv file.

FaBox (Palle Villesen Fredsted, Aarhus University, Denmark) - an online fasta sequence toolbox, including Fasta header editor, Fasta header replacer, Fasta sequence extractor, Fasta sequence subtractor, Fasta sequence joiner, Fasta dataset splitter/divider

SeqScrub - is a web application that cleans up FASTA file headers and appends information from external databases. (Reference: Foley G et al. (2019) BioTechniques 67(2): 50-54).

PAL2NAL - is a program that converts a multiple sequence alignment of proteins and the corresponding DNA (or mRNA) sequences into a codon alignment. The program automatically assigns the corresponding codon sequence even if the input DNA sequence has mismatches with the input protein sequence, or contains UTRs, polyA tails. It can also deal with frame shifts in the input alignment, which is suitable for the analysis of pseudogenes. The resulting codon alignment can further be subjected to the calculation of synonymous (d_S) and non-synonymous (d_N) substitution rates. (Reference: Suyama M et al. (2006) Nucl Acids Res 34: W609-W612).

red_bullet.gif height= Shuffle DNA and Sequence Randomizer permit one to randomize a sequence to compare with one's own.

red_bullet.gif height= Random Sequence Generator (Vladimír Cermák, molbiotools.com) - generate random DNA, RNA or protein sequences. Based on the Mersenne Twister algorithm. No unwanted repeats are generated even in very long sequences. Can be used for calculations of DNA, RNA and protein molecular weights and for string reverse and complement transformations.

UPDATED: August, 2025