Help:Protein coding sequences/Design

< Back to Protein Help

Protein coding sequences are DNA sequences that are transcribed into mRNA and in which the corresponding mRNA molecules are translated into a polypeptide chain. Every three nucleotides, termed a codon, in a protein coding sequence encodes 1 amino acid in the polypeptide chain. In some cases, different chassis may either map a given codon to a different sequence or may use different codons more or less frequently. Therefore some protein coding sequences may be optimized for use in a particular chassis.

In the Registry, protein coding sequences begin with a start codon (usually ATG and end with a double stop codon (TAA TAA).

Although protein coding sequences are often considered to be basic parts, in fact proteins coding sequences can themselves be composed of one or more regions, called protein domains. Thus, a protein coding sequence could either be entered as a basic part or as a composite part of two or more protein domains.

Before you design a new protein coding sequence, you might check if your protein coding sequence already exists in the Registry. Many commonly use proteins like fluorescent proteins and repressors are already available in the Registry.

Here are some basic things to consider when designing protein coding sequences.

  1. A protein coding sequence contains one or more Internal domains. Design the internal domains so that the codon usage is optimized for the chassis in which the part will be used.
  2. If you also want to control where the protein is located in the cell, you might consider a localization sequence as the Head domain.
  3. If you also want to purify or quantify the protein later, you might add an affinity tag as the Head or Tail domain.
  4. The Tail domain of protein coding sequences is made up of a double stop codon TAATAA.
    1. A single TAG, or TAGTAG, may cause the creation of an illegal restriction site (XbaI, SpeI) depending on the preceding nucleotides
  5. If you want to control the degradation rate of the protein in E. coli, you might add a degradation tag as the Tail domain.
  6. Make sure that the protein coding sequence doesn't have any BioBrick sites in it (EcoRI, XbaI, SpeI, or PstI). If it does, you'll need to remove them.
  7. If you are planning on synthesizing the protein coding sequence via commercial gene synthesis, design your protein coding sequence to remove useful restriction enzyme sites. See a list of suggested sites for removal at [http://openwetware.org/wiki/Synthetic_Biology:BioBricks/Part_fabrication#Constructing_a_BioBrick_part_via_direct_synthesis OpenWetWare].