Difference between revisions of "Part:BBa K1799008"
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | |||
__NOTOC__ | __NOTOC__ | ||
<partinfo>BBa_K1799008 short</partinfo> | <partinfo>BBa_K1799008 short</partinfo> | ||
Line 7: | Line 6: | ||
The portion of the LsrACDBFG operon that codes for the AI-2 transporter consists of four open reading frames, which code for A, C, D, and B, respectively. Three of these ORFs overlap with one another: C overlaps with A, and its RBS is contained within the A subunit ORF, and D overlaps with C (although to a lesser extent), and its RBS is contained within the C subunit ORF. From a modularization standpoint, this creates an inefficiency because the wild type ORFs of the A and C subunits both contain functional RBSs, and in the case of the A subunit ORF, an ATG 4bp downstream. To reduce this inefficiency, we have created versions of each subunit ORF that are optimized for modularization. | The portion of the LsrACDBFG operon that codes for the AI-2 transporter consists of four open reading frames, which code for A, C, D, and B, respectively. Three of these ORFs overlap with one another: C overlaps with A, and its RBS is contained within the A subunit ORF, and D overlaps with C (although to a lesser extent), and its RBS is contained within the C subunit ORF. From a modularization standpoint, this creates an inefficiency because the wild type ORFs of the A and C subunits both contain functional RBSs, and in the case of the A subunit ORF, an ATG 4bp downstream. To reduce this inefficiency, we have created versions of each subunit ORF that are optimized for modularization. | ||
− | + | ||
===Usage and Biology=== | ===Usage and Biology=== | ||
− | + | ||
+ | '''Modularization of the LsrACDB Operon''' | ||
+ | |||
+ | This part is one of a collection of LsrACDB operon parts (BBa_K1799003-10). The wild-type part sequences are copies of the naturally-occurring ORFs, with RBSs for LsrC and D embedded in LsrA and C, respectively. When tuning a device governing the transport of AI-2, it will be useful to have both the wild type and optimized, modularized versions of these translational units. Therefore we submitted both versions. | ||
+ | |||
+ | Modularization of operon functionality serves a number of important objectives. First, it allows for revised control of individual elements working in concert, including alterations in relative gene expression rates based on prescribed design goals. Modularization enables “copy and paste” excision of individual functionalities such as promoters, ribosome binding sites and coding sequences embedded within the operon. Modularization also allows for precise standardization of functionalities and improved predictability of usage. | ||
+ | |||
+ | These last two objectives (excision and precision) were the major focus for our iGEM project involving the LsrACDB operon. The first objective (revision) could not be pursued due to time constraints. However, the excision and revision work accomplished by the Genspace iGEM team serve as an important foundation for future revision pursuits. | ||
+ | |||
+ | The LsrACDB operon can be written as shown in the figure below: | ||
+ | |||
+ | https://static.igem.org/mediawiki/2015/3/3b/GenspaceProject_ModularizationFig1.jpg | ||
+ | LsrACDB operon | ||
+ | |||
+ | The Ribosome Binding Sites for lsrA and lsrB are already modular as they reside outside of any coding sequences. However, the Ribosome Binding Sites for lsrC and lsrD reside within the coding sequences for lsrA and lsrC, respectively. Therefore, our objective was not only to excise these Ribosome Binding Sites, but also to increase the effective “precision” in the specification of the coding sequences to minimize the chance that some portion would be spuriously recognized as a Ribosome Binding Site. This was accomplished through the introduction of silent mutations that preserved the amino acid sequence of the expressed proteins, and yet eradicated or greatly reduced the efficacy of the strongest observed Ribosome Binding Sites. Finally, this was done not just for lsrA and lsrC (to yield what we would call lsrA2 and lsrC2) but for all four coding sequences as shown below: | ||
+ | |||
+ | https://static.igem.org/mediawiki/2015/9/9e/GenspaceProject_ModularizationFig2.jpg | ||
+ | Modularized LsrACDB operon | ||
+ | |||
+ | The promoter, pLsrA2, has been defined based on the native pLsrA to begin with the CRP binding site 112bp upstream of wild-type LsrA (see www.ecogene.org) and to end just before the putative RBSA (i.e., CGGGGG, which begins 10bp upstream of wild-type LsrA). The RBSA differs from the Shine-Dalgarno sequence (AGGAGG) in only two places. | ||
+ | |||
+ | The lsrA coding sequence ends in the following base-pairs with periods denoting codon boundaries: | ||
+ | |||
+ | ...CGT.CAG.GAG.GCG.TCA.TGC.TGA | ||
+ | |||
+ | The ATG in bold is the beginning of the lsrC coding sequence. In a modularized lsrA, this sequence runs the risk of being a spurious start codon (resulting in the unintended expression of downstream sequence). The underlined letters are the Shine-Dalgarno sequence, located the textbook distance of 4 bp away from the ATG. Modularization of lsrA requires that the coding sequence function only as the intended coding sequence, and not as the intended coding sequence plus a ribosome binding site plus the beginning of some additional spurious coding sequence. We now focus on modularizing lsrA through the strategic introduction of silent mutations. | ||
+ | |||
+ | The end of the lsrA sequence above codes for the following amino acids: | ||
+ | |||
+ | …Arg.Gln.Glu.Ala.Ser.Cys.[STOP] | ||
+ | |||
+ | We exploit the degeneracy of the codon table to find a new sequence that eradicates the RBS and the start codon while preserving the amino acid sequence through silent mutations. The carboxyl-terminal amino acids of lsrA and their possible codons are as follows. | ||
+ | |||
+ | Glutamine: CAA or CAG | ||
+ | Glutamic Acid: GAA or GAG | ||
+ | Alanine: GCT, GCC, GCA, GCG | ||
+ | Serine: AGT, AGC, TCT, TCC, TCA, TCG | ||
+ | Cysteine: TGT, TGC | ||
+ | |||
+ | Note that underscoring and boldface distinctions are preserved in the above list of codons in order to be consistent with the location of the Shine-Dalgarno sequence and the start codon, respectively, in the wild-type lsrA sequence. | ||
+ | |||
+ | In reverse order, there is no change to the Cysteine codon which will liberate us from the TG ending to the spurious ATG start codon (since we can only choose from TGT and TGC). The Serine codon can be changed to save us from a spurious ATG start codon: but which alternate codon should we choose? We note that E. coli actually employ a plurality of start codons: 83% are ATG, 11% are GTG, 3% are TTG and the balance are believed to be ATT and CTG (https://en.wikipedia.org/wiki/Start_codon). So we choose TCC as the codon for Serine in order to modify our spurious ATG start codon to become CTG (i.e., a very improbable start codon). | ||
+ | |||
+ | We now set out to destroy the Ribosome Binding Site lurking near the end of lsrA. Continuing our backwards march, we see there is no strong reason at present to change the codon for Alanine since the G that terminates the Shine-Dalgarno sequence will still be there. However, changing the Glutamic Acid codon to GAA and the Glutamine codon to CAA destroys the Shine-Dalgarno portion of the RBS from AGGAGG to AAGAAG (i.e., a two-letter difference). | ||
+ | |||
+ | We can estimate the degree to which we have reduced the efficacy of the Ribosome Binding Site by using an on-line calculator (https://www.denovodna.com/software/, based on http://www.nature.com/nbt/journal/v27/n10/full/nbt.1568.html ). Under “Predict: Translation Rates” and using Version 1.1 of the Free Energy Model and setting the 16S rRNA to “ACCUCCUUA”, we estimate the relative ribosomal translation rate using the last 27 base pairs of lsrA2 candidates (20 upstream of the potentially spurious start codon and 7 downstream). | ||
+ | |||
+ | In particular, we study four cases: | ||
+ | 1. No change to the original lsrA | ||
+ | 2. Changing the spurious start codon only | ||
+ | 3. Changing the Shine-Dalgarno sequence only | ||
+ | 4. Changing both the spurious start codon and the Shine-Dalgarno sequence | ||
+ | |||
+ | The normalized results are as follows: | ||
+ | |||
+ | https://static.igem.org/mediawiki/2015/4/4f/GenspaceProject_ModularizationFig3.jpg | ||
+ | Relative Translation Rate versus changes to lsrA | ||
+ | |||
+ | The original lsrA sequence results in a translation rate normalized to 100. By changing the spurious start codon (i.e., going to a TCC for Serine to change the spurious start codon from an ATG to a CTG), we are able to reduce the rate four-fold. But the major gains are brought about by reducing similarity to the Shine-Dalgarno sequence. The translation rate is reduced by almost two orders of magnitude by changing the AGGAGG sequence to AAGAAG (i.e., just a two letter difference). Changing both the spurious start codon as well as the Shine-Dalgarno similarity leaves the translation rate approximately the same at 1.8. This suggests that the removal of RBS candidates focus on reducing the similarity to the Shine-Dalgarno sequence. A change of only two letters can reduce the translation rate by almost two orders of magnitude. | ||
+ | When we consider lsrA in its entirety, we can compare each group of 6 adjacent base pairs with the Shine-Dalgarno sequence. Some will have 6 letters in agreement (such as the one we were just studying) and some will have none. The ones with 6 letters in agreement – and even the ones with 5 letters – constitute a concern for being an unintended RBS because of its similarity with the Shine-Dalgarno sequence. If we can introduce silent mutations that do not change the expressed amino acids, but can obtain at least two letters of disagreement with the Shine-Dalgarno sequence, then we estimate that we have reduced the spurious translation rate by approximately two orders of magnitude. We now proceed to do this for lsrA. | ||
+ | |||
+ | In the original lsrA, we find that only one group of 6 adjacent base pairs perfectly matches the Shine-Dalgarno sequence (i.e., the one we have been studying so far). There are two groups of 6 adjacent base pairs that differ by only one letter and forty groups that differ by two letters. The full distribution of agreement for lsrA is shown below: | ||
+ | |||
+ | https://static.igem.org/mediawiki/2015/b/bd/GenspaceProject_ModularizationEvalFigFour.png | ||
+ | Histogram of Shine-Dalgarno agreement in lsrA & lsrA2 | ||
+ | |||
+ | We also show the distribution of agreement for an edited lsrA2 that has a number of silent mutations introduced. There is no difference in the expressed amino acid sequence for LsrA2 compared to LsrA, but no sequence of 6 adjacent base pairs in lsrA2 has more than 4 letters in agreement with the Shine-Dalgarno sequence. As such, there should be few if any unintended Ribosome Binding Sites in lsrA2. The same exercise was repeated to generate lsrB2, lsrC2 and lsrD2 to complete the modularization exercise. | ||
<span class='h3bb'>Sequence and Features</span> | <span class='h3bb'>Sequence and Features</span> | ||
<partinfo>BBa_K1799008 SequenceAndFeatures</partinfo> | <partinfo>BBa_K1799008 SequenceAndFeatures</partinfo> |
Latest revision as of 03:13, 19 September 2015
LsrC (modularization optimized)
Gene for the C subunit of the LsrACDB transporter, present in E. coli on the LsrACDBFG operon. The C subunit is a permease protein in the transporter, which actively brings autoinducer 2 (AI-2) into the cell.
The portion of the LsrACDBFG operon that codes for the AI-2 transporter consists of four open reading frames, which code for A, C, D, and B, respectively. Three of these ORFs overlap with one another: C overlaps with A, and its RBS is contained within the A subunit ORF, and D overlaps with C (although to a lesser extent), and its RBS is contained within the C subunit ORF. From a modularization standpoint, this creates an inefficiency because the wild type ORFs of the A and C subunits both contain functional RBSs, and in the case of the A subunit ORF, an ATG 4bp downstream. To reduce this inefficiency, we have created versions of each subunit ORF that are optimized for modularization.
Usage and Biology
Modularization of the LsrACDB Operon
This part is one of a collection of LsrACDB operon parts (BBa_K1799003-10). The wild-type part sequences are copies of the naturally-occurring ORFs, with RBSs for LsrC and D embedded in LsrA and C, respectively. When tuning a device governing the transport of AI-2, it will be useful to have both the wild type and optimized, modularized versions of these translational units. Therefore we submitted both versions.
Modularization of operon functionality serves a number of important objectives. First, it allows for revised control of individual elements working in concert, including alterations in relative gene expression rates based on prescribed design goals. Modularization enables “copy and paste” excision of individual functionalities such as promoters, ribosome binding sites and coding sequences embedded within the operon. Modularization also allows for precise standardization of functionalities and improved predictability of usage.
These last two objectives (excision and precision) were the major focus for our iGEM project involving the LsrACDB operon. The first objective (revision) could not be pursued due to time constraints. However, the excision and revision work accomplished by the Genspace iGEM team serve as an important foundation for future revision pursuits.
The LsrACDB operon can be written as shown in the figure below:
LsrACDB operon
The Ribosome Binding Sites for lsrA and lsrB are already modular as they reside outside of any coding sequences. However, the Ribosome Binding Sites for lsrC and lsrD reside within the coding sequences for lsrA and lsrC, respectively. Therefore, our objective was not only to excise these Ribosome Binding Sites, but also to increase the effective “precision” in the specification of the coding sequences to minimize the chance that some portion would be spuriously recognized as a Ribosome Binding Site. This was accomplished through the introduction of silent mutations that preserved the amino acid sequence of the expressed proteins, and yet eradicated or greatly reduced the efficacy of the strongest observed Ribosome Binding Sites. Finally, this was done not just for lsrA and lsrC (to yield what we would call lsrA2 and lsrC2) but for all four coding sequences as shown below:
Modularized LsrACDB operon
The promoter, pLsrA2, has been defined based on the native pLsrA to begin with the CRP binding site 112bp upstream of wild-type LsrA (see www.ecogene.org) and to end just before the putative RBSA (i.e., CGGGGG, which begins 10bp upstream of wild-type LsrA). The RBSA differs from the Shine-Dalgarno sequence (AGGAGG) in only two places.
The lsrA coding sequence ends in the following base-pairs with periods denoting codon boundaries:
...CGT.CAG.GAG.GCG.TCA.TGC.TGA
The ATG in bold is the beginning of the lsrC coding sequence. In a modularized lsrA, this sequence runs the risk of being a spurious start codon (resulting in the unintended expression of downstream sequence). The underlined letters are the Shine-Dalgarno sequence, located the textbook distance of 4 bp away from the ATG. Modularization of lsrA requires that the coding sequence function only as the intended coding sequence, and not as the intended coding sequence plus a ribosome binding site plus the beginning of some additional spurious coding sequence. We now focus on modularizing lsrA through the strategic introduction of silent mutations.
The end of the lsrA sequence above codes for the following amino acids:
…Arg.Gln.Glu.Ala.Ser.Cys.[STOP]
We exploit the degeneracy of the codon table to find a new sequence that eradicates the RBS and the start codon while preserving the amino acid sequence through silent mutations. The carboxyl-terminal amino acids of lsrA and their possible codons are as follows.
Glutamine: CAA or CAG Glutamic Acid: GAA or GAG Alanine: GCT, GCC, GCA, GCG Serine: AGT, AGC, TCT, TCC, TCA, TCG Cysteine: TGT, TGC
Note that underscoring and boldface distinctions are preserved in the above list of codons in order to be consistent with the location of the Shine-Dalgarno sequence and the start codon, respectively, in the wild-type lsrA sequence.
In reverse order, there is no change to the Cysteine codon which will liberate us from the TG ending to the spurious ATG start codon (since we can only choose from TGT and TGC). The Serine codon can be changed to save us from a spurious ATG start codon: but which alternate codon should we choose? We note that E. coli actually employ a plurality of start codons: 83% are ATG, 11% are GTG, 3% are TTG and the balance are believed to be ATT and CTG (https://en.wikipedia.org/wiki/Start_codon). So we choose TCC as the codon for Serine in order to modify our spurious ATG start codon to become CTG (i.e., a very improbable start codon).
We now set out to destroy the Ribosome Binding Site lurking near the end of lsrA. Continuing our backwards march, we see there is no strong reason at present to change the codon for Alanine since the G that terminates the Shine-Dalgarno sequence will still be there. However, changing the Glutamic Acid codon to GAA and the Glutamine codon to CAA destroys the Shine-Dalgarno portion of the RBS from AGGAGG to AAGAAG (i.e., a two-letter difference).
We can estimate the degree to which we have reduced the efficacy of the Ribosome Binding Site by using an on-line calculator (https://www.denovodna.com/software/, based on http://www.nature.com/nbt/journal/v27/n10/full/nbt.1568.html ). Under “Predict: Translation Rates” and using Version 1.1 of the Free Energy Model and setting the 16S rRNA to “ACCUCCUUA”, we estimate the relative ribosomal translation rate using the last 27 base pairs of lsrA2 candidates (20 upstream of the potentially spurious start codon and 7 downstream).
In particular, we study four cases: 1. No change to the original lsrA 2. Changing the spurious start codon only 3. Changing the Shine-Dalgarno sequence only 4. Changing both the spurious start codon and the Shine-Dalgarno sequence
The normalized results are as follows:
Relative Translation Rate versus changes to lsrA
The original lsrA sequence results in a translation rate normalized to 100. By changing the spurious start codon (i.e., going to a TCC for Serine to change the spurious start codon from an ATG to a CTG), we are able to reduce the rate four-fold. But the major gains are brought about by reducing similarity to the Shine-Dalgarno sequence. The translation rate is reduced by almost two orders of magnitude by changing the AGGAGG sequence to AAGAAG (i.e., just a two letter difference). Changing both the spurious start codon as well as the Shine-Dalgarno similarity leaves the translation rate approximately the same at 1.8. This suggests that the removal of RBS candidates focus on reducing the similarity to the Shine-Dalgarno sequence. A change of only two letters can reduce the translation rate by almost two orders of magnitude. When we consider lsrA in its entirety, we can compare each group of 6 adjacent base pairs with the Shine-Dalgarno sequence. Some will have 6 letters in agreement (such as the one we were just studying) and some will have none. The ones with 6 letters in agreement – and even the ones with 5 letters – constitute a concern for being an unintended RBS because of its similarity with the Shine-Dalgarno sequence. If we can introduce silent mutations that do not change the expressed amino acids, but can obtain at least two letters of disagreement with the Shine-Dalgarno sequence, then we estimate that we have reduced the spurious translation rate by approximately two orders of magnitude. We now proceed to do this for lsrA.
In the original lsrA, we find that only one group of 6 adjacent base pairs perfectly matches the Shine-Dalgarno sequence (i.e., the one we have been studying so far). There are two groups of 6 adjacent base pairs that differ by only one letter and forty groups that differ by two letters. The full distribution of agreement for lsrA is shown below:
Histogram of Shine-Dalgarno agreement in lsrA & lsrA2
We also show the distribution of agreement for an edited lsrA2 that has a number of silent mutations introduced. There is no difference in the expressed amino acid sequence for LsrA2 compared to LsrA, but no sequence of 6 adjacent base pairs in lsrA2 has more than 4 letters in agreement with the Shine-Dalgarno sequence. As such, there should be few if any unintended Ribosome Binding Sites in lsrA2. The same exercise was repeated to generate lsrB2, lsrC2 and lsrD2 to complete the modularization exercise. Sequence and Features
- 10COMPATIBLE WITH RFC[10]
- 12COMPATIBLE WITH RFC[12]
- 21COMPATIBLE WITH RFC[21]
- 23COMPATIBLE WITH RFC[23]
- 25INCOMPATIBLE WITH RFC[25]Illegal AgeI site found at 706
- 1000COMPATIBLE WITH RFC[1000]