Conversion of gbk genome data into a form compatable with CoreGenes

  1. The first step is to extract the proteins from the gbk file. For this I use Genome2D resulting in the following type of results for phage 4HA13 (locus tag AC4HA13):

    >AC4HA13_000
    MKPNYVAIRKSKEAMFHRFIEAKRKAELEGKVVIKKKKNKKKNYNIFFFLLSR
    
    >AC4HA13_010
    MTYYDADLGLVMCESELSLEILDALDWEHELPKGEPQWGDDDYVYVAPTDEFDIPF
            
  2. Copy and paste these into Notepad.


  3. Check for spurious characters and deleted them. I sometimes find quotation marks (").


  4. Using the Replace feature of Notepad to replace >AC4HA13 with >gp|AC4HA13|AC4HA13_% giving:

    >gp|AC4HA13|AC4HA13_%000
    MKPNYVAIRKSKEAMFHRFIEAKRKAELEGKVVIKKKKNKKKNYNIFFFLLSR
    
    >gp|AC4HA13|AC4HA13_%010
    MTYYDADLGLVMCESELSLEILDALDWEHELPKGEPQWGDDDYVYVAPTDEFDIPF
            
  5. The tedious part - paste ||[Escherichia phage 4HA13] at the end of each fasta row

    >gp|AC4HA13|AC4HA13_%000||[Escherichia phage 4HA13]
    MKPNYVAIRKSKEAMFHRFIEAKRKAELEGKVVIKKKKNKKKNYNIFFFLLSR
    
    >gp|AC4HA13|AC4HA13_%010||[Escherichia phage 4HA13]
    MTYYDADLGLVMCESELSLEILDALDWEHELPKGEPQWGDDDYVYVAPTDEFDIPF
            
  6. Paste this into Custom Data in CoreGenes 3.5

Updated: January, 2026