Difference between revisions of "Part:BBa K4235000"
(→Protein modeling) |
(→Protein modeling) |
||
Line 28: | Line 28: | ||
676 Precursor Protein Sequence | 676 Precursor Protein Sequence | ||
− | + | MRVLGGRCGALLACLLLVLPVSEANFLSKQQASQVLVRKRRANSLLEETKQGNLERECIEELCNKEEAREVFENDPETDYFYPKYLVCLRSFQTGLFTAARQSTNAY<br>PDLRSCVNAIPDQCSPLPCNEDGYMSCKDGKASFTCTCKPGWQGEKCEFDINECKDPSNINGGCSQICDNTPGSYHCSCKNGFVMLSNKKDCKDVDECSLKPSICGTAVCKNIPGDF<br>ECECPEGYRYNLKSKSCEDIDECSENMCAQLCVNYPGGYTCYCDGKKGFKLAQDQKSCEVVSVCLPLNLDTKYELLYLAEQFAGVVLYLKFRLPEISRFSAEFDFRTYDSEGVILYAES<br>IDHSAWLLIALRGGKIEVQLKNEHTSKITTGGDVINNGLWNMVSVEELEHSISIKIAKEAVMDINKPGPLFKPENGLLETKVYFAGFPRKVESELIKPINPRLDGCIRSWNLM<br>KQGASGIKEIIQEKQNKHCLVTVEKGSYYPGSGIAQFHIDYNNVSSAEGWHVNVTLNIRPSTGTGVMLALVSGNNTVPFAVSLVDSTSEKSQDILLSVENTVIYRIQALSLCS<br>DQQSHLEFRVNRNNLELSTPLKIETISHEDLQRQLAVLDKAMKAKVATYLGGLPDVPFSATPVNAFYNGCMEVNINGVQLDLDEAISKHNDIRAHSCPSVWKKTKNS<br> | |
635 Mature Protein Sequence | 635 Mature Protein Sequence | ||
− | + | ANSLLEETKQGNLERECIEELCNKEEAREVFENDPETDYFYPKYLVCLRSFQTGLFTAARQSTNAYPDLRSCVNAIPDQCSPLPCNEDGYMSCKDGKASFTCTCK<br>PGWQGEKCEFDINECKDPSNINGGCSQICDNTPGSYHCSCKNGFVMLSNKKDCKDVDECSLKPSICGTAVCKNIPGDFECECPEGYRYNLKSKSCEDIDECSENMCAQLCVNYPGG<br>YTCYCDGKKGFKLAQDQKSCEVVSVCLPLNLDTKYELLYLAEQFAGVVLYLKFRLPEISRFSAEFDFRTYDSEGVILYAESIDHSAWLLIALRGGKIEVQLKNEHTSKITTGGDVINNGLWN<br>MVSVEELEHSISIKIAKEAVMDINKPGPLFKPENGLLETKVYFAGFPRKVESELIKPINPRLDGCIRSWNLMKQGASGIKEIIQEKQNKHCLVTVEKGSYYPGSGIAQFHIDY<br>NNVSSAEGWHVNVTLNIRPSTGTGVMLALVSGNNTVPFAVSLVDSTSEKSQDILLSVENTVIYRIQALSLCSDQQSHLEFRVNRNNLELSTPLKIETISHEDLQRQLAVLDKA<br>MKAKVATYLGGLPDVPFSATPVNAFYNGCMEVNINGVQLDLDEAISKHNDIRAHSCPSVWKKTKNS<br> | |
<h6>(1.)676 Precursor Protein Sequence model:</h6> | <h6>(1.)676 Precursor Protein Sequence model:</h6> |
Revision as of 05:02, 15 September 2022
Human Protein S Gene (PROS1)
Codon optimized for sf9 cells
Usage and Biology
Protein S is a vitamin K-dependent plasma protein that functions to prevent hypercoagulation of the blood. It serves as a non enzymatic cofactor for activated Protein C and is involved in the inactivation of coagulation factors Va and VIIIa. Protein S exists in two states in plasma, about 40% circulates as a free, functionally active form and the remaining 60% exists in the inactive form bound with C4b-binding protein. Protein S is secreted by hepatocytes, megakaryocytes, endothelial cells, etc. The initial form of secreted protein S is a 676 amino acid precursor protein, which undergoes a cleavage of a signal peptide present at the N-terminal, resulting in the mature 635 amino acid protein. Functionally active Protein S can directly bind to inhibit factor IXa, which activates factor X to Xa. Factor Xa and Va together form the prothrombinase complex responsible for activation of thrombin. Moreover, by acting as a cofactor for activated protein C, protein S promotes the cleavage of Factor VIIIa and Va, inhibiting the coagulation cascades.
Mutations in this gene (inherited as an autosomal dominant, homozygous or heterozygous fashion) cause non-functional or lower plasma levels of Protein S resulting in a Protein S deficiency. Individuals with Protein S deficiency are at an increased risk of developing abnormal blood clots, specifically in the smaller veins, known as venous thromboembolism. Two most common conditions associated with Protein S deficiency are deep vein thrombosis and pulmonary embolism. Although rare, infants with severe protein S deficiency can develop several blood clots throughout the body, resulting in a life threatening condition known as purpura fulminans. Moreover, severe COVID-19 infections are known to cause a decline in protein S levels, which further contributes to infection severity by causing extensive endothelial dysfunction and lung damage, which is a major cause of COVID-related mortality.
Sequence and Features
- 10INCOMPATIBLE WITH RFC[10]Illegal EcoRI site found at 2028
Illegal XbaI site found at 1341 - 12INCOMPATIBLE WITH RFC[12]Illegal EcoRI site found at 2028
- 21INCOMPATIBLE WITH RFC[21]Illegal EcoRI site found at 2028
- 23INCOMPATIBLE WITH RFC[23]Illegal EcoRI site found at 2028
Illegal XbaI site found at 1341 - 25INCOMPATIBLE WITH RFC[25]Illegal EcoRI site found at 2028
Illegal XbaI site found at 1341 - 1000COMPATIBLE WITH RFC[1000]
Protein modeling
Bioinformatics tools can be used to model a protein structure if the amino-acid sequence of the protein is known. Computational protein structure prediction relies on principles obtained through techniques including X-ray crystallography, NMR spectroscopy and other physical energy functions to predict, with a certain level of accuracy, the three-dimensional structures of proteins. These methods use various Machine Learning algorithms to develop and predict comprehensive protein structures. We decided to use three methods for modeling our protein, protein S, for which there is not a structure available.
Here, we used the Ab-Initio Method to model Protein S structure. When there is not a known structure of a similar protein, this method can be used to determine the tertiary structure of a protein. This method conducts a conformational search using a designed energy function and generates a number of possible conformations. From these, final models can be selected.
The Protein S (PROS1) sequence comprises 676 amino acids. Through literature review, it was found that PROS1 is synthesized as a 676 amino acid precursor protein which is processed to a mature protein of 635 amino acids. The 41 amino acid difference between the two accounts for a signaling peptide that is necessary for the expression of the protein. A post-translational modification, specifically a simple singular peptide bond cleavage, results in the mature form of the protein that gets secreted. Therefore, we chose to model both the pre-cleaved PROS1 (676 AA), and the truncated mature PROS1 (635 AA).
676 Precursor Protein Sequence
MRVLGGRCGALLACLLLVLPVSEANFLSKQQASQVLVRKRRANSLLEETKQGNLERECIEELCNKEEAREVFENDPETDYFYPKYLVCLRSFQTGLFTAARQSTNAY
PDLRSCVNAIPDQCSPLPCNEDGYMSCKDGKASFTCTCKPGWQGEKCEFDINECKDPSNINGGCSQICDNTPGSYHCSCKNGFVMLSNKKDCKDVDECSLKPSICGTAVCKNIPGDF
ECECPEGYRYNLKSKSCEDIDECSENMCAQLCVNYPGGYTCYCDGKKGFKLAQDQKSCEVVSVCLPLNLDTKYELLYLAEQFAGVVLYLKFRLPEISRFSAEFDFRTYDSEGVILYAES
IDHSAWLLIALRGGKIEVQLKNEHTSKITTGGDVINNGLWNMVSVEELEHSISIKIAKEAVMDINKPGPLFKPENGLLETKVYFAGFPRKVESELIKPINPRLDGCIRSWNLM
KQGASGIKEIIQEKQNKHCLVTVEKGSYYPGSGIAQFHIDYNNVSSAEGWHVNVTLNIRPSTGTGVMLALVSGNNTVPFAVSLVDSTSEKSQDILLSVENTVIYRIQALSLCS
DQQSHLEFRVNRNNLELSTPLKIETISHEDLQRQLAVLDKAMKAKVATYLGGLPDVPFSATPVNAFYNGCMEVNINGVQLDLDEAISKHNDIRAHSCPSVWKKTKNS
635 Mature Protein Sequence
ANSLLEETKQGNLERECIEELCNKEEAREVFENDPETDYFYPKYLVCLRSFQTGLFTAARQSTNAYPDLRSCVNAIPDQCSPLPCNEDGYMSCKDGKASFTCTCK
PGWQGEKCEFDINECKDPSNINGGCSQICDNTPGSYHCSCKNGFVMLSNKKDCKDVDECSLKPSICGTAVCKNIPGDFECECPEGYRYNLKSKSCEDIDECSENMCAQLCVNYPGG
YTCYCDGKKGFKLAQDQKSCEVVSVCLPLNLDTKYELLYLAEQFAGVVLYLKFRLPEISRFSAEFDFRTYDSEGVILYAESIDHSAWLLIALRGGKIEVQLKNEHTSKITTGGDVINNGLWN
MVSVEELEHSISIKIAKEAVMDINKPGPLFKPENGLLETKVYFAGFPRKVESELIKPINPRLDGCIRSWNLMKQGASGIKEIIQEKQNKHCLVTVEKGSYYPGSGIAQFHIDY
NNVSSAEGWHVNVTLNIRPSTGTGVMLALVSGNNTVPFAVSLVDSTSEKSQDILLSVENTVIYRIQALSLCSDQQSHLEFRVNRNNLELSTPLKIETISHEDLQRQLAVLDKA
MKAKVATYLGGLPDVPFSATPVNAFYNGCMEVNINGVQLDLDEAISKHNDIRAHSCPSVWKKTKNS
(1.)676 Precursor Protein Sequence model:
(2.)635 Precursor Protein Sequence model:
Ramachandran plots
The Ramachandran plot shows the statistical distribution of the combinations of the backbone dihedral angles ϕ and ψ. It gives information about the energetically allowed and disallowed regions in the protein. Having a lower number of residues in the disallowed regions and a high number of residues in the allowed energy regions indicates a good protein structure. We measured this parameter across all models obtained from the ab-initio method. The Ramachandran plots have been generated using MolProbity. It is most complete for crystal structure of proteins and acts as an active validation tool that produces coordinates, graphics and numerical evaluations. Higher weightage was given to the Ramachandran Plot values over the RMSD values.
Mathematical modeling
Many biological projects require models that can accurately represent and predict multicomponent, temporally evolving, dynamic systems. Differential equation models are often used for this purpose. This method models the interaction of molecules in the form of rate equations. The system is represented by a system of ordinary differential equations (ODEs), which quantify the interaction between different molecules (i.e. DNA, mRNA or protein), by using the law of mass action. These equations include terms relating to the binding of transcription factors and RNA polymerase to DNA, interactions between transcription factors, mRNA translation rate, and mRNA and protein degradation rates (Ay and Arnosti 2011), for example. In order to accomplish this, knowledge about the system components and structure is required.
We used MATLAB simulations to model our genetic circuits. We used a system of ODEs derived for both constitutive and regulatory genetic circuits(PPBP-Polyhedrin promoter interactions) for our SF9 expression system. We were able to input our system of ODE’s into MATLAB to predict our steady state concentrations prior to starting wet lab in order to determine the process that would yield more favorable results. Those plots can be found below:
SF9 constitutive gene circuit model:
(1.) Protein vs time:
(2.) mRNA vs time:
(3.) Effects of altering DNA concentration on protein production:
SF9 Regulatory gene circuit model:
The majority of recombinant protein expression is driven by the most powerful baculovirus promoter, polyhedrin, which is active in the late and very late stages of infection. Several studies have been done to study the mechanism of the polyhedrin promoter to better characterize its transcription activity. A host secreted transcription factor, polyhedrin promoter binding protein (PPBP) is known to bind a specific sequence on the polyhedrin promoter with extremely high affinity and specificity and plays a major role in the level of transcription through the polyhedrin promoter. It is established that the PPBP binds to the minor groove of DNA, interacting with the polyhedrin promoter sequence and forming a complex. Further studying the PPBP-DNA interactions and manipulating the concentration of PPBP in host SF9 cells can have a significant impact on the yield of recombinant proteins.
Here, we modeled the interaction between the PPBP and the polyhedrin promoter and its effect on the rate of production of mRNA and the resulting protein S. We were unable to find an established literature value for the concentration/number of molecules of the PPBP in SF9 cells, therefore, we estimated the concentration of PPBP based on some commonly found transcription factors in Drosophila Melanogaster. The plots of MATLAB simulations are listed below:
(1.) mRNA vs protein S over time:
(2.) mRNA vs time:
(3.) Protein vs time:
Information on parameter estimation, rate constants and analysis can be found on our model wiki page.
Protein Modeling
Characterization
Future Directions:
References:
U.S. National Library of Medicine. (n.d.). PROS1 protein S [homo sapiens (human)] - gene - NCBI. National Center for Biotechnology Information. Retrieved August 2, 2022, from https://www.ncbi.nlm.nih.gov/gene/5627
U.S. National Library of Medicine. (n.d.). CCDS report for consensus cds. National Center for Biotechnology Information. Retrieved August 2, 2022, from https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&GO=MainBrowse&DATA=CCDS2923.1
Majumder, R., & Nguyen, T. (2021). Protein S: function, regulation, and clinical perspectives. Current opinion in hematology, 28(5), 339–344. https://doi.org/10.1097/MOH.0000000000000663
Pilli, V. S., Plautz, W., & Majumder, R. (2016). The Journey of Protein S from an Anticoagulant to a Signaling Molecule. JSM biochemistry and molecular biology, 3(1), 1014.