GenBank submission

After doing DNA analyses, there is a moment where you have to submit the sequences to GenBank in order to obtain accession number for your manuscript. For the previous manuscript, the number of sequences was limited and from ITS/28S only. My co-author Dick Groenenberg did the submission as part of the laboratory routine.

This time, I had more sequences and from different loci. As I thought the submission might be less straightforward, I asked Dick for some assistance. We figured out the following procedure:

1. Open your sequence aligment in MacClade; make sure all base pairs are
coded correctly and missing bases (N) and gaps (-) indicated. Assure
that all sequences are of equal length.
2. Save file and export it in Fasta format

3. Open Fasta file in texteditor (e.g. TextWrangler).
4. In header of each sequence add organism name and identifier. E.g.
[organism=Thaumastus thompsoni] cytochrome oxidase subunit I gene,
partial cds. Save file [1]
5. Open alignment again in MacClade; select all, Characters>Genetic
code>Drosophila mt, Characters>Codon positions>Calculate codon
positions>choose to minimize stop codons; Characters>Data
format>Translate to protein>consider gaps.
6. Colour cells to protein: Display>Amino Acid Translation>Show
translated AAs.
7. Check for absence of stop codons (black coloured).
8. Check for presence of motif VMIFF, in case of CO1.
9. Export in Fasta format, save file [2].
10. Open Sequin. Start new submission. Submission type Phylogenetic
study. Import nucleotide alignment [1]; tab organism: check location
(mitochondrion/genomic) and genetic code (invertebrate
mitochondrial/standard); EITHER import protein file [2] and check if
Sequin matches correctly both files OR leave translation to proteins to Sequin done
automatically (but check against file [2]); save file with sqn-extension.


The tricky part is the use of Sequin. While there is an on-line help file to walk you through the subsequent windows (, this doesn’t help when you actually trapped with some pop-up windows reporting errors. At one moment in time, we encountered a problem with the translation from nucleotides to proteins. Whatever we tried, the error remained in the first position (‘-‘ instead of ‘T’). Needless to say that we started several times from scratch, trying different options.


While the error report was still pending with GenBank, I managed to get the translation correct. However, in the final sqn-file there were still reported errors in some of the sequence features but no fatal ones.
The sequences are submitted but will only appear in GenBank once the first manuscript is published.


