Revision as of 03:32, 21 October 2019

ext install esbenp.prettier-vscode

Sortase A7M (Ca²⁺-independent variant)

Profile

Name	Sortase A7M
Base pairs	450
Molecular weight	17.85 kDa
Origin	Staphylococcus aureus, synthetic
Properties	Ca²⁺-independent, transpeptidase, linking sorting motif LPXTG to poly-glycine Tag

Usage and Biology

generic filler text

Transpeptidase: Sortase

generic filler text

Reaction

generic filler text

Sortase variants

generic filler text

Sortase A7M

generic filler text

Methods

generic filler text

Cloning

generic filler text

Expression and purification

generic filler text

SDS-Page

generic filler text

Flourescence Resonance Energy Transfer (FRET)

generic filler text

Mass Spectrometry

generic filler text

Results

generic filler text

Characterization of Sortase A7M (and comparison to Sortase A5M)

generic filler text

How do we measure if our purified sortases are active?

generic filler text

how do we measure sortase reaction kinetics

generic filler text

Development of a new FRET pair

generic filler text

Why are enzyme-substrate ratio and duration important parameters of the sortase reaction?

generic filler text

Who wins - Sortase A7M or Sortase A5M

generic filler text

What about other substrates?

generic filler text

Primary Amines

generic filler text

Yield

generic filler text

Is Sortase A7M able to attach cargo to P22 coat protein?

generic filler text

Does methionine affect Sortase linking?

generic filler text

Are there other Sortases that might me useful?

generic filler text

Modeling

Introduction

In synthetic biology, theoretical models are often used to gain insights, predict and improve experiments. In our project we are modifying Virus-like particles (VLPs) by attaching proteins to the surface of the P22 capsid through a linker. The linking is catalyzed using the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A from Staphylococcus aureus. We performed modeling to predict the unknown structure of the Sortase A7M, to improve the linker between proteins and therefore optimizing the modification efficiency of our platform.
Two different modeling approaches were used to determine the structure of Sortase A7M. We compared machine learning approaches to traditional comparative, Monte-Carlo based modeling methods. The results were evaluated using an energy-scoring function and molecular dynamics (MD) simulations. The most promising Sortase A7M structures were used to perform a docking simulation to screen for optimal linkers.

Structure determination

In silico modeling and simulation of proteins requires a 3D structure, which can be obtained from the RCSB Protein Data Bank. However, if no 3D structures are annotated, as it is the case with sortase A7M, the structure has to be determined by other means. The structure prediction of sortase A7M was done using two different approaches.

RosettaCM

Background

In our second approach we used the RosettaCommons comparative modeling (RosettaCM), which is based on homology modeling. Homology modeling is a protein modeling method, which requires one or more template structures as base the protein to be modeled on. The protein sequences are aligned with the sequence of the target protein. Unaligned sections are modeled using fragment or protein libraries, which leads to creating protein structures based on different sequence homologues of the protein of interest. Ab-initio or de novo modeling on the other hand attempts to find protein structures solely based on physicochemical principles applied to the primary sequence, which can be compared to the refolding of a denaturated protein.

RosettaCM combines ab-initio modeling with homology modeling. The homologus structures for which a resolved 3D structure with sufficiently similar sequence exists are generated using homology modeling. Afterwards the unaligned sequences are modeled de novo. By combining the two methods RosettaCM represents a precise and resource efficient tool for protein structure prediction. Rosetta applications rely on the Monte-Carlo Optimization, which is a probabilistic approach to finding a local minimum in the energy landscape of protein conformations. The underlying equation serving as the fundament of the statistical Monte-Carlo method is the Metropolis acceptance criterion:

where k_B is the Boltzmann constant, ΔE the difference in energy of the two states and T the temperature. The term k_BT can also be written as a single factor β.

During the statistical protein folding based on the Monte-Carlo method, the initial structure is changed by small random perturbations of the atom locations. Whether the structure is accepted or not is decided by the Metropolis acceptance criterion. If ΔE < 0, the structure is accepted, otherwise the newly proposed structure is accepted with probability p as described in the Metropolis acceptance criterion.

Procedure

The RosettaCM protocol requires evolutionary related structures and sequences, as well as fragment files of the target structure. The fragment files serve as a structure template for the proteins and they consist of peptide fragments of sizes 3 and 9. We gathered five evolutionary related structures from the RCBS PDB with the accession numbers:

1ija
1itw
1itp
1ito
2mlm

The five RCBS entries represent different structures of sortases from Staphylococcus aureus. Fragment files can be created with the Robetta online server or with the Rosetta FragmentPicker application.

The RosettaCM procedure is best described in the following steps:

sequence and structural alignment of templates
fragment insertion in unaligned sections
replacement of random segment with segment from a different template structure
energy minimization
all-atom optimization

The alignment can be performed with various tools. We used MAFFT to generate the multiple sequence alignments. Prior to using the alignments as an input, they were converted to the grishin alignment format as RosettaCM requires the alignments to be in said format. The minimization is performed using the Rosetta controid energy function. For the centroid function to be applied, the protein is converted to the centroid representation. A protein in centroid representation consists of the backbone atoms N, C_α;, O_Carbonyl and an atom of varying size representing the side chain. The advantage of using the centroid representation is that the energy landscape can be traversed easier due to the smoother nature of the centroid energy landscape. Finally the generated structure undergoes a second minimization in an all-atom model by means of Monte-Carlo optimization. This is similar to the energy minimization but without the amino acids being represented as centroids of their functional groups. Structures computed through all-atom optimizations can reach atomic resolutions {{Quelle rosetta paper}} which is crucial for a model meant to be used to estimate atomic interactions.

Results

The run yielded 15,000 structures which have been compared using the Rosetta scoring functions (talaris2013). From the 15,000 structures generated, we inspected the ten best scoring structures.

As can be seen in figure 5, the most prominent differences can be found in the regions close to the N- and C-terminus. As fluctuations in those regions are not untypical, we decided to use the best scoring structure, candidate S_14771 (figure 6), as the input for the simulations to follow.

Figure x : The structural alignment of the ten best scoring sortase structures displaying minor differences with the exception of the C- and N-terminal regions. N- and C-terminal regions tend to show strong fluctuations, thus it is unsurprising to find the terminal regions to be unaligned

Figure x : Sortase A7M candidate S_14771 created through RosettaCM.

In order to evaluate the secondary structure of the Sortase A7M candidate S_14771 a Ramachandran plot has been created and compared to the five sortases used as input for the comparitive modeling. Comparisons were also drawn with the Sortase predicted by Deep Learning as well as a database of randomly sampled proteins. Ramachandran plots of dihedral angles (fig x) can be a first indicator whether the structures computed are valid.

Figure x : Caption?

Figure 5: The comparison of the ramachandran plot of structure S_14771 and the ramachandran plot found on Protopedia suggests that secondary structures are present. Hence the structure appears to contain α-helices, β-sheets and a small amount of lefthanded α-helices.

Conclusion

We used machine learning methods, as well as monte-carlo simulations to determine the structure of the mutated transpeptidase Sortase A7M. The machine learning approach using AlQuarishi's Deep Neural Network yielded a structure which seemed to not have any secondary structures. To exclude the possibility of an error in the PyMOL visualization software by Schroedinger, a Ramachandran plot (figure xyz) was created. The plot shows that no typical secondary structures are present which is a strong indicator of a failed approach to determine a structure. The approach, using Rosetta Comparative Modeling, yielded 15,000 structures scored with the talaris2013 scoring function. The ten best structures were aligned and exhibited almost identical secondary structures (figure xzy). The greatest structural differences are present in the N- and C-terminal regions. Since terminal regions tend to fluctuate more strongly than non-terminal segments of the protein, we deemed those fluctuations non-relevant for the proteins functionality.
Being the best scoring candidate, structure S_14771 was analyzed structurally using a Ramachandran plot (figure xyz). The plot shows all the relevant and typical structures sortases exhibits and serves as an indicator for a successful structure prediction.
In the steps to follow, a molecular dynamics (MD) simulation will be performed on both structures. Even though structure CASP12 does not seem to be a valid structure, refolding processes during a MD simulation might lead to a relaxation of the protein and allow for a promising prediction of the sortase A7M structure.