Difference between revisions of "Help:Protein coding sequences/Design"

Line 1: Line 1:
 
[[Help:Proteins|< Back to Protein Help]]
 
[[Help:Proteins|< Back to Protein Help]]
  
Protein coding sequences are DNA sequences that are transcribed into mRNA and in which the corresponding mRNA molecules are translated into a polypeptide chain which then folds into a 3 dimensional structure called a protein.   
+
Translational units are DNA sequences that encode mRNAs that can be translated into a polypeptide chain which then folds into a 3 dimensional structure called a protein.  It includes both an RBS and the DNA sequence that encodes the protein, called the protein coding sequence.   
  
Every protein coding sequence in the Registry consists of at least three protein domains, a Head Domain, one or more Internal Domains including Special Internal Domains, and a Tail Domain.  For more details, see the [[Protein domains]] page.
+
Translational units consist of at least three protein domains, a Head Domain, one or more Internal Domains including Special Internal Domains, and a Tail Domain.  For more details, see the [[Protein domains]] page. Thus, the protein coding sequence starts within the Head domain and ends with the Tail domain. 
  
 
Before you design a new protein coding sequence, you might check if your [[Protein coding sequence|protein coding sequence already exists in the Registry]].  Many commonly use proteins like [[Protein_coding_sequences/Reporters|fluorescent proteins]] and [[Protein_coding_sequences/Transcriptional_regulators|repressors]] are already available in the Registry.   
 
Before you design a new protein coding sequence, you might check if your [[Protein coding sequence|protein coding sequence already exists in the Registry]].  Many commonly use proteins like [[Protein_coding_sequences/Reporters|fluorescent proteins]] and [[Protein_coding_sequences/Transcriptional_regulators|repressors]] are already available in the Registry.   
Line 9: Line 9:
 
Here are some basic things to consider when designing protein coding sequences.
 
Here are some basic things to consider when designing protein coding sequences.
  
#The Head domain of most protein coding sequences is made up of either just the start codon ATG (as is the case for most existing protein coding sequences) or a ribosome binding site and ATG start codon (the preferred design for new protein coding sequences).  In ''E. coli'', however, the rate of translational initiation depends not only on the sequence of the ribosome start site and start codon, but also on the second and third codon as well.  So we suggest that you also include the second and third codon in the Head domain.
+
#In ''E. coli'', the rate of translational initiation depends not only on the sequence of the ribosome start site and start codon, but also on the second and third codon as well.  So we suggest that you also include the second and third codon in the Head domain as well.
 
#*If you also want to control where the protein is located in the cell, you might consider a [[Protein_domains/Localization|localization sequence]] as the Head domain.   
 
#*If you also want to control where the protein is located in the cell, you might consider a [[Protein_domains/Localization|localization sequence]] as the Head domain.   
 
#*If you also want to purify or quantify the protein later, you might add an affinity tag as the Head domain.
 
#*If you also want to purify or quantify the protein later, you might add an affinity tag as the Head domain.

Revision as of 02:49, 30 April 2009

< Back to Protein Help

Translational units are DNA sequences that encode mRNAs that can be translated into a polypeptide chain which then folds into a 3 dimensional structure called a protein. It includes both an RBS and the DNA sequence that encodes the protein, called the protein coding sequence.

Translational units consist of at least three protein domains, a Head Domain, one or more Internal Domains including Special Internal Domains, and a Tail Domain. For more details, see the Protein domains page. Thus, the protein coding sequence starts within the Head domain and ends with the Tail domain.

Before you design a new protein coding sequence, you might check if your protein coding sequence already exists in the Registry. Many commonly use proteins like fluorescent proteins and repressors are already available in the Registry.

Here are some basic things to consider when designing protein coding sequences.

  1. In E. coli, the rate of translational initiation depends not only on the sequence of the ribosome start site and start codon, but also on the second and third codon as well. So we suggest that you also include the second and third codon in the Head domain as well.
    • If you also want to control where the protein is located in the cell, you might consider a localization sequence as the Head domain.
    • If you also want to purify or quantify the protein later, you might add an affinity tag as the Head domain.
  2. A protein coding sequence contains one or more Internal domains. Design the internal domains so that the codon usage is optimized for the chassis in which the part will be used.
  3. The Tail domain of most protein coding sequences is made up of just a double stop codon TAATAA.
    • If you want to control the degradation rate of the protein in E. coli, you might add a degradation tag as the Tail domain.
    • If you also want to purify or quantify the protein later, you might add an affinity tag.
  4. Make sure that the protein coding sequence doesn't have any BioBrick sites in it (EcoRI, XbaI, SpeI, or PstI). If it does, you'll need to remove them.
  5. If you are planning on synthesizing the protein coding sequence via commercial gene synthesis, design your protein coding sequence to remove useful restriction enzyme sites. See a list of suggested sites for removal at [http://openwetware.org/wiki/Synthetic_Biology:BioBricks/Part_fabrication#Constructing_a_BioBrick_part_via_direct_synthesis OpenWetWare].
  6. After you do an initial design of your protein coding sequence, we suggest you run it through a secondary structure prediction algorithm like [http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi mFold] to check if any hairpins form. In particular you should look for any hairpins that form between the RBS and the rest of the protein coding sequence that might inhibit translational initiation. If you find a hairpin, try to make some silent mutations in the coding sequence to disrupt the stability of the secondary structure.