Difference between revisions of "Part:BBa K3187028"

Line 1: Line 1:
ext install esbenp.prettier-vscode
<partinfo>BBa_K3187028 short</partinfo>
<partinfo>BBa_K3187028 short</partinfo>
Line 31: Line 31:
<h3> Usage and Biology</h3>
<h1>Usage and Biology</h1>
    generic filler text
<h3>Transpeptidase: Sortase</h3>
        generic filler text
    generic filler text
<h3>Sortase variants</h3>
    generic filler text
<h3>Sortase A7M</h3>
    generic filler text
    generic filler text
    generic filler text
<h3>Expression and purification</h3>
    generic filler text
    generic filler text
<h3>Flourescence Resonance Energy Transfer (FRET)</h3>
    generic filler text
<h3>Mass Spectrometry</h3>
    generic filler text
    generic filler text
<h2>Characterization of Sortase A7M (and comparison to Sortase A5M)</h2>
    generic filler text
<h3>How do we measure if our purified sortases are active?</h3>
    generic filler text
<h2>how do we measure sortase reaction kinetics</h2>
    generic filler text
<h3>Development of a new FRET pair</h3>
    generic filler text
<h2>Why are enzyme-substrate ratio and duration important parameters of the sortase reaction?</h2>
    generic filler text
<h2>Who wins - Sortase A7M or Sortase A5M</h2>
    generic filler text
<h2>What about other substrates?</h2>
    generic filler text
<h3>Primary Amines</h3>
    generic filler text
    generic filler text
<h2>Is Sortase A7M able to attach cargo to P22 coat protein?</h2>
    generic filler text
<h2>Does methionine affect Sortase linking?</h2>
    generic filler text
<h2>Are there other Sortases that might me useful?</h2>
    generic filler text
<!--Hier anfangen Einzufügen-->
    In synthetic biology, theoretical models are often used to gain insights, predict and
    experiments. In our project we are modifying Virus-like particles (VLPs) by attaching
    proteins to the
    surface of the P22 capsid
    <!-- Link zum Background oder Project overview  --> through a linker. The linking is
    catalyzed using
    the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A
    <!-- Link zum Sortase Background --> from <i>Staphylococcus aureus</i>. We performed
    modeling to predict the unknown structure of the
    Sortase A7M, to improve the linker between proteins and therefore optimizing the
    efficiency of our platform. <br>
    Two different modeling approaches were used to determine the structure of Sortase A7M.
    We compared
    machine learning approaches to traditional comparative, Monte-Carlo based modeling
    methods. The
    results were evaluated using an energy-scoring function and molecular dynamics (MD)
    simulations. The
    most promising Sortase A7M structures were used to perform a docking simulation to
    screen for
    optimal linkers.
<h2>Structure determination</h2>
    <i>In silico</i> modeling and simulation of proteins requires a 3D structure,
    which can be
    obtained from the <a href="https://www.rcsb.org/" target="_blank">RCSB Protein
    Bank</a>. However, if no 3D structures are annotated, as it is the case with
    A7M, the structure has to be determined by other means. The structure prediction
    of sortase A7M was done using two different approaches.
    In our second approach we used the <a href="rosettacommons.org"
    target="_blank"><i>RosettaCommons</a> comparative modeling
    (<a>RosettaCM</a>)</i>, which
    is based on homology modeling. <i>Homology modeling</i> is a protein modeling
    method, which
    requires one or more template structures as base the protein to be modeled on.
    The protein
    sequences are aligned with the sequence of the target protein. Unaligned
    sections are
    modeled using fragment or protein libraries, which leads to creating
    <!-- ästhetik --> protein structures based
    on different sequence homologues of the protein of interest.
    <i>Ab-initio</i> or <i>de novo</i> modeling on the other hand attempts to find
    structures solely based on physicochemical principles applied to the primary
    sequence, which
    can be compared to the refolding of a denaturated protein.
<p>RosettaCM combines <i>ab-initio modeling</i> with <i>homology modeling</i>. The
    homologus structures for which a resolved 3D structure with sufficiently similar
    sequence exists are generated using homology modeling. Afterwards the unaligned
    sequences are modeled de novo. By combining the two methods RosettaCM
    represents a precise and resource efficient tool for protein structure
    Rosetta applications rely on the Monte-Carlo Optimization, which is a
    approach to finding a local minimum in the energy landscape of protein
    conformations. The
    underlying equation serving as the fundament of the statistical Monte-Carlo
    <!-- ref original paper --> method is the Metropolis acceptance criterion: </p>
    <div class="row" style="height: 4em;">
        <img class="img-fluid center"
            style="max-height: 100%; width: auto; margin: 0 auto;">
    <br> where k<sub>B</sub> is the Boltzmann constant, &Delta;E the difference in
    energy of the two states and T the temperature. The term k<sub>B</sub>T can also
    be written as a single factor &beta;.
    During the statistical protein folding based on the Monte-Carlo method, the
    structure is changed by small random perturbations of the atom locations.
    Whether the structure is accepted or
    not is decided by the Metropolis acceptance criterion.
    If &Delta;E < 0, the structure is accepted, otherwise the newly proposed
    structure is accepted with probability p as described in the Metropolis
    acceptance criterion.
    The RosettaCM protocol requires evolutionary related structures and
    as well as fragment files of the target structure.
    The fragment files serve as a structure template for the proteins and
    consist of peptide fragments of sizes 3 and 9.
    We gathered five evolutionary related structures from the RCBS PDB with
    accession numbers:
    The five RCBS entries represent different structures of sortases from
    <i>Staphylococcus aureus</i>.
    Fragment files can be created with the Robetta <a
    target="_blank">online server</a> or with the Rosetta FragmentPicker
    The RosettaCM procedure is best described in the following steps:</p>
    <!-- quelle auf rosetta cm seite-->
        <li>sequence and structural alignment of templates</li>
        <li>fragment insertion in unaligned sections</li>
        <li>replacement of random segment with segment from a different template
        <li>energy minimization</li>
        <li>all-atom optimization</li>
    The alignment can be performed with various tools. We used <a
    target="_blank">MAFFT</a> to
    generate the multiple sequence alignments.
    Prior to using the alignments as an input, they were converted to the
    alignment format as RosettaCM requires the alignments to be in said
    The minimization is performed using the Rosetta controid energy
    function. For
    the centroid function to be applied, the protein is converted to the
    representation. A protein in centroid representation consists of the
    atoms N, C<sub>&alpha;</sub>;, O<sub>Carbonyl</sub> and an atom of
    varying size representing the
    side chain. The advantage of using the centroid representation is that
    energy landscape can be traversed easier due to the smoother nature of
    centroid energy landscape.
    Finally the generated structure undergoes a second minimization in an
    all-atom model by
    means of Monte-Carlo optimization. This is similar to the energy
    minimization but without the amino acids being
    represented as centroids of their functional groups. Structures computed
    all-atom optimizations can reach atomic resolutions
    {{Quelle rosetta paper}}
    which is crucial for a model meant to be used to estimate atomic
    The run yielded 15,000 structures which have been compared using the
    scoring functions (talaris2013).
    <!-- scoring -->
    From the 15,000 structures generated, we inspected the ten best scoring
    As can be seen in figure 5, the most prominent differences can
    be found in the regions close to the N- and C-terminus. As
    fluctuations in those
    regions are not untypical, we decided to use the best scoring
    structure, candidate S_14771 (figure 6), as the input for the
    simulations to follow.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/4/40/T--TU_Darmstadt--top10_corporate.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The structural alignment of the ten best scoring
            sortase structures
            displaying minor differences with the exception of the C- and
            regions. N- and C-terminal regions tend to show strong
            fluctuations, thus it is
            unsurprising to find the terminal regions to be unaligned
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/b/b3/T--TU_Darmstadt--s14771.gif" style="max-width:50%" />
        <div class="caption">
              Figure x :
            Sortase A7M candidate S_14771 created through RosettaCM.
        <p> In order to evaluate the secondary structure of the Sortase A7M
            candidate S_14771 a Ramachandran plot has been created and compared to
            the five sortases used as input for the comparitive modeling.
            Comparisons were also drawn with the Sortase predicted by Deep Learning
            as well as a database of randomly sampled proteins.
            Ramachandran plots of dihedral angles (fig x) can be a first indicator
            whether the structures computed are valid.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/2/28/T--TU_Darmstadt--ramachandran_s14711.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/e/ee/T--TU_Darmstadt--ramachandran_five_sortases.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/7/73/T--TU_Darmstadt--Comp_Ramachandran.PNG" style="max-width:50%" />
        <div class="caption">
              Figure x :
            <b>Figure 5: </b> The comparison of the ramachandran plot of
            structure S_14771 and the ramachandran plot found on <a
            suggests that secondary structures are present. Hence the structure
            to contain &alpha;-helices, &beta;-sheets and a small amount of
    We used machine learning methods, as well as monte-carlo simulations
    determine the structure of the mutated transpeptidase Sortase A7M.
    The machine
    learning approach using AlQuarishi's Deep Neural Network yielded a
    structure which seemed to
    not have any secondary structures. To exclude the possibility of an
    error in the
    PyMOL visualization software by Schroedinger, a Ramachandran plot
    (figure xyz)
    was created. The plot shows that no typical secondary structures are
    which is a strong indicator of a failed approach to determine a
    The approach, using <i>Rosetta Comparative Modeling</i>, yielded
    structures scored with the talaris2013 scoring function. The ten
    best structures
    were aligned and exhibited almost identical secondary structures
    (figure xzy).
    The greatest structural differences are present in the N- and
    regions. Since terminal regions tend to fluctuate more strongly than
    non-terminal segments of the protein, we deemed those fluctuations
    for the proteins functionality.
    Being the best scoring candidate, structure S_14771 was analyzed
    using a Ramachandran plot (figure xyz). The plot shows all the
    relevant and
    typical structures sortases exhibits and serves as an indicator for
    successful structure prediction.
    In the steps to follow, a molecular dynamics (MD)
    simulation will be performed on both structures. Even though
    structure CASP12
    does not seem to be a valid structure, refolding processes during a
    simulation might lead to a relaxation of the protein and allow for a
    prediction of the sortase A7M structure.
<h2>Molecular dynamics</h2>
    The structure predictions made so far were based on statistical methods with
    constraints. The Deep
    Learning algorithm uses a neural network trained to find a function associating
    amino acid sequence and
    the final 3D positions of the atoms within the protein. On the other hand,
    were made with Rosetta
    using the Monte Carlo Method. Here random movement of individual atoms occurs,
    and the
    energy is estimated after
    each step.
    Even though both methods use physical constraints to find plausible protein
    structures, neither of them actually
    simulates the behavior of these molecules within a physical force field.
    Moreover, both methods do not necessarily output fully relaxed protein
    structures and simulate water implicitly by preferring hydrophilic parts of the
    proteins to be on the outside. Thus, we conducted a molecular dynamics (MD)
    simulation to verify the plausibility of our protein structure and allow
    The molecular dynamics simulation provides the opportunity to simulate water as
    discrete molecules, creating a solvated protein. This step is crucial to
    validate the structures, as the interaction with water is one of the primary
    mechamism for protein folding.
    Since neither candidate CASP12 nor S_14771 have been modeled with explicit water
    an according MD simulation is imperative, to
    verify the correctness of the candidates conformation.
    This of course is much more expensive in terms of computational ressources. As
    the protein has to be placed in a simulation box
    and said box is filled with water molecules. This is called solvation and is
    visualized for candidate S_14771 in figure eeeeee.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/0/08/T--TU_Darmstadt--MoleculeInWater.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            Sortase A7M in a force field surrounded by discrete
            water molecules. Image was made with gmxSolvate.
    We used GROMACS (GROningen MAchine for Chemical Simulations)
    <!-- cite --> as the tool for our molecular dynamic simulations. GROMACS solves
    equations of motion for
    individual atoms
    <sup id="cite_ref-1" class="reference">
    <a href="#cite_note-1">[1] </a>
    . While this classical simulation is much more accurate than predictions made by
    other methods,
    approximations are used nonetheless: Forces are cut after a certain radius and
    the system
    size is quite small.
    <sup id="cite_ref-1" class="reference">
        <a href="#cite_note-1">[1] </a>
    Additionally, atoms are assumed to be classical particles, which is not the
    case, as quantum mechanics plays a role in particle-particle interactions.
    Still, this simulation is very computationally expensive. Therefore, only time
    periods less
    than one second could be
    To perform the molecular dynamics simulations we mostly followed the <a
        target="_blank">GROMACS Lysosome tutorial</a> as it serves our purpose
    perfectly. We created our simulation box to be of dodecahedral shape and a 0.7
    nm distance of the solute to the box borders. We used periodic boundry
    conditions and a Na<sup>+</sup> Cl<sup>-</sup> concentration of 0.012 mol/L. The
    main difference of our approach was that we used the CHARMM36
    <!-- cite --> force field instead of the OPLS-AA/L force field and have adjusted
    our molecular dynamics parameters <a
    The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to
    simulate approximately 1 ns per hour.
    To analyse the MD simulation we used the Python programming language and the <a
        href="https://www.biotite-python.org/" target="_blank">Biotite package</a>
    <!-- cite --> as well as GROMACS analysis tools as
    <!-- links zu den jungs--> <a>covar</a> and anaeig.
    The first analyses are a root-mean-square deviation (RMSD), a root-mean-square
    fluctuation (RMSF) and a gyration radius analysis.
    RMSD calculations have been described in the structure prediction section. To
    compute the RMSF the movement distance of each
    residue is computed as a root-mean-square over time as:
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/2/26/T--TU_Darmstadt--RMSF.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
<p>where v(t)<sub>i</sub> is the position of atom i at time t. The radius of
            <ol class="references">
    gyration is
              <li id="cite_note-1">
    <!-- überarbeiten -->
                <span class="mw-cite-backlink">
    The final analysis performed on the MD simulation is called Principle Component
                  <a href="#cite_ref-1">↑</a>
    Analysis (PCA).
    By applying PCA to a protein it is possible to gain insights into the relevant
                <span class="reference-text">
    vibrational motions and thereby the physical mechanism of the protein
                Hee-Jin Jeong, Gita C. Abhiraman, Craig M. Story, Jessica R. Ingram, Stephanie K. Dougan, Generation of Ca2+-independent sortase A
    <!-- zitat -->.
mutants with enhanced activity for protein and cell surface labeling, PLOS ONE, 2017
                <a rel="nofollow" class="external autonumber" href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0189068">[1] </a>
              <li id="cite_note-2">
                <span class="mw-cite-backlink">
                          <a href="#cite_ref-2">↑</a>
                      <span class="reference-text">
                          Hongyuan Mao, Scott A. Hart, Amy Schink, and Brian A. Pollok, Sortase-Mediated Protein Ligation: A New Method for Protein Engineering, JACS, 2004
                          <a rel="nofollow" class="external autonumber"
                              href="https://pubs.acs.org/doi/10.1021/ja039915e">[2] </a>
    The first possible indicators of a stable protein structure are converging
    root-mean-square deviation (RMSD),
    small root-mean-square
    fluctuation (RMSF) values
    as well as converging radii of gyration. Using the Python software package and
    the module Biotite we calculated
    these quantities and plotted the results for both candidate S_14771 and
    candidate CASP12.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/4/4f/T--TU_Darmstadt--rmsd_s14771.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The RMSD is one of three main indicators of a stable
            protein structure of the MD simulation of
            S_14771 over the period of 200,000 ps. As time progressed the RMSD
            increased with a smaller slope.
            The value stabilizes at a time of 110,000 ps and fluctuated around the
            value of 6 &#8491;.
<!-- Add more about the biology of this part here
<img class="img-fluid center"
===Usage and Biology===
        src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsd_casp.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            At t = 40,000 ps already the RMSD has arived at a
            stable value, while at the same time
            the gyration (fig x) radius decreases over time continuously. This
            information suggests the protein
            might be folding and potentially develpoing secondary structures not
            present previously.
<!-- -->
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/9/94/T--TU_Darmstadt--gyration_s14771.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The prominent fluctuations of the residues from ranges
            105 to 115 might
            indicate a binding site or another form of functional structure. The
            radius of gyration, just as
            the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps
            and converges towards a value of
            16.7 &#8491;.
<h2><span class='h3bb'>Sequence and Features</span></h2>
<partinfo>BBa_K3187028 SequenceAndFeatures</partinfo>
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/0/03/T--TU_Darmstadt--gyration_casp.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            As from t = 40,000 ps the radius of gyration
            decreases constantly. At the end of the simulation the gyration radius
            reaches a value of 17 &#8491;.
            This behavior indicates folding of the protein structure.
<!-- Uncomment this to enable Functional Parameter display
<img class="img-fluid center"
===Functional Parameters===
        src="https://2019.igem.org/wiki/images/f/f4/T--TU_Darmstadt--rmsf_s14771.png" style="max-width:50%" />
<partinfo>BBa_K3187028 parameters</partinfo>
<!-- -->
        <div class="caption">
              Figure x :
            The fluctuations
            (RMSF) of most residues appear insignificant compared to the first, the
            last residues and
            the residues close to residue 110 . Typically the N- and C-terminus tend
            to fluctuate more intensively due to the lack of
            stabilizing structures. The prominent fluctuations in the range of
            residue 105 to 115
            can indicate a binding site or another form of functional structure.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/a/aa/T--TU_Darmstadt--rmsf_casp.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The prominent fluctuations of the residues from
            ranges 105 to 115 might
            indicate a binding site or another form of functional structure. The
            radius of gyration, just as
            the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps
            and converges towards a value of
            16.7 &#8491;.
    Typical RMSDs and radii of gyration converge towards a value dependent on the
    size of the
    protein. Convergence of those quantities can be interpreted as a stable state of
    the protein
    structure. As it can be seen in Figures x and y both the RMSD and the radius of
    stabilize at the same time as the simulation reaches 110,000 ps (110 ns),
    suggesting a now
    stabilized structure of candidate S_14771 solvated in water. Another indicator
    of a
    functional protein is the RMSF. Instead of being averaged over all atoms, the
    RMSF is
    averaged over time with respect to each amino acid. It provides insights in both
    stability and functionality. Fig xzf reveals the RMSF of residues 105 to 115 to
    significantly higher than that of other residues. This hints at the presence of
    functional unit along these residues. As commented on in the section
    describing our structure prediction approaches, the N-
    and C-terminal regions tend to fluctuate more strongly as a result of the
    absence of
    stabilizing structures.
    RMSD and gyration of radius calculations of candidate CASP12 (figures x and y)
    provide evidence of folding.
    However, the RMSF values show values significantly higher, an
    effect possibly caused by instability or refolding. Nevertheless, the strongest
    fluctuations, disregarding the terminal regions, can be seen in the region of
    residue 105 to
    115. This insight consolidates the theory that residues 105 to 115 might be a
    part of a
    functional unit.
    We were unsure whether candidate CASP12 can be considered a plausible structure
    how to interpret the findings concerning the prominent fluctuations. Therefore,
    we decided to perform a
    <i>Principle Component Analysis</i>.
<h3>Principle component analysis</h3>
    To analyze our system further Principle Component Analysis (PCA) was performed
    using GROMACS.
<img class="img-fluid center"
        <p><b>Animation 4: </b> A Principle Component Analysis of a fast (blue) and a
            slow (red) mode showing the most prominent movements of the C&alpha;-chain
            of candidate S_14771. Both modes show movement of the &beta;6&#47;&beta;7
            loop consisting of residues 105 to 115 towards the active site . Thus we can
            assume that the closing &beta;6&#47;&beta;7 loop is involved in the reaction
    The results from the Principle Component Analysis of candidate S_14771
    (animation xy) show a movement of the residues 105 to 115 towards the active
    site, supporting our theory that residues 105 to 115 are important for the
    reaction mechanism. Since the slow mode (red), which shows the most relevant
    movement of the sortase, moves further towards the active site, it is possible
    that the &beta;6&#47;&beta;7 loop either closes the binding site of the ligand
    peptides or even transports one peptide towards the other.
    Animation xyz shows the results of the Principle Component Analysis of candidate
    CASP12. As the RMSF calculations suggested (fig xyz), the whole protein seems to
    be moving randomly with no directed movement.
    In addition the active site amino acids
    <!-- ref --> are spread across the protein confirming our assumption that the
    protein is not in a stable or plausible conformation.
    We gained evidence that at least on of our Sortase A7M models is a valid and
    stable candidate by performing various methods to analyse the structural
    stability and validity of our two Sortase A7M candidates. The candidate S_14771
    that was generated using <i>RosettaCM</i> appears to be a fitting candidate not
    only due to successful analyses, but also since the residues of the active site
    <!-- ref --> are close enough to each other to catalyze a ligation reaction.
    Our model created through deep learning excelled only in terms of RMSD and
    gyration radius calculations. Not only the RMSF and Principle Component Analysis
    but also the conformation of the active site have proven candidate CASP12 to be
    of no use for further calculations as it does not portray a valid conformation
    of Sortase A7M.
    Now that the binding site of the Sortase had been found, the peptide ligand
    needed to be inserted into the binding site to create a peptide-protein complex.
    The procedure of choice
    for the introduction of a ligand into the binding site of a protein is called
    <i>docking</i>. In the
    following sections, we will present the protocol and methods we used as well as
    the results they yielded.
    Enzymes are one of the most relevant macromolecules in biology. Their
    functionality is determined through the way they interact with their ligands.
    Although enzymes are highly specific concerning the ligands they interact with,
    similar compounds can often bind to the same enzyme albeit with different
    To determine the best possible binding conformation of the protein-ligand
    complex, we use FlexPepDock, an algorithm provided by the the RosettaCommons
    software package.
    The ab-initio FlexPepDock protocol consists of multiple steps and is documented
    on the RosettaCommons <a href="">online documentation</a>. We modified the
    protocol as the one provided did not work with our approach.
    The modified protocol has the following form:
    <li>secondary structure determination</li>
    <li>complex creation</li>
    <li>FlexPepDock refinement</li>
    To determine the secondary structure of the peptide, fragment files (3- and
    5-mers) had to be generated and a PSIPRED secondary structure prediction had to
    be performed. As the peptides had a sequence length less than 20 amino acids, we
    were not able to use the online services such as <a
        href="http://robetta.bakerlab.org/">Robetta</a> and the <a
        href="http://bioinf.cs.ucl.ac.uk/psipred/">PSIPRED online service</a>.
    Instead we used the Rosetta <a
        application</a> and the PSIPRED <a
        href="https://github.com/psipred/psipred">command line tool</a>.
    The generated structures serve as the input for the refinement protocol.
    The generation of the peptide-protein complex can be divided into three steps:
    <li>peptide creation</li>
    <li>peptide relaxation</li>
    <li>coarse complex creation</li>
    The peptide structure was created through ab-initio modeling.
    Initial creation of the peptide was followed by insertion of the peptide into
    the sortase binding site. This lead to a coarse model of the peptide sortase
    complex. Here we used insight gained from the molecular dynamics simulation to
    place the peptide close to the binding site.
    <!-- vielleicht hier schon biotite erwähnen -->
    In the final step the FlexPepDock refinement protocol is executed and 50,000
    complex structures are generated. We used the inputs as described in
    {{fuhrman paper}}, written by the authors of the FlexPepDock documentation.
    To get a better overview over our data we performed a clustering in python,
    using the scikit-learn package. We clustered the structures with respect to:
    <li>total score: the total score of the docking provided by the <i>Rosetta</i>
        scoring function</li>
    <li>interface score: the sum of the energy of the residues in the interfacing
    <li>reweighted score: a score calculated by double weighting the contribution of
        the residues in the interfacing region</li>
    <li>root-mean-square deviation: the root-mean-square deviation of the peptides
        in relation to the structure with the highest score</li>
    <li>peptide direction: the direction the peptide is facing</li>
    Here clustering is used to group the docking results and thereby descrease the
    samlple size.
    From the 50,000 results we picked the results with the 500 best total scores,
    the 500 best interface scores and
    the 500 best reweighted scores.
    As we aimed to create an unbiased set for clustering, the abscence of duplicates
    in the set was ensured.
    We decreased the sample size to 100 groups representing the best scoring
    structures from the three categories.
    50,000 structures have been created and clustered.
    After the clustering the sample consisted of 100 structures of docked complexes.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/7/78/T--TU_Darmstadt--dock_lpetgg.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The three best scoring structures (total score, interface
            score, reweighted score) of the LPETGG-tag are shown. Only two results are
            visible as the best reweighted score candidate is identical to the best
            interface score candidate. The reacting section of the LPETGG-tag namely
            glycine is colored yellow as is the active site. The glycin of both ligand
            peptides is facing the active site.
    Analysis of the scores has shown a similar score for all the three dockings. The
    best scoring results of the LPETGG docking show a tendency of the glycines to
    face the active site while also being in close proximity to the active site.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/8/8d/T--TU_Darmstadt--dock_polyg.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The three best scoring structures (total score,
            interface score, reweighted score) of the poly-g peptide are shown. Only
            two results are visible as the best reweighted score candidate is
            identical to the best interface score candidate. Instead of facing the
            active site (yellow) the reacting glycines (yellow) appear to interact
            with the &beta;6&#47;&beta;7 loop of the sortase.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/9/92/T--TU_Darmstadt--dock_mpolyg.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The three best scoring structures (total score,
            interface score, reweighted score) of the poly-g peptide are shown. Only
            two results are visible as the best reweighted score candidate is
            identical to the best interface score candidate.
            Concerning the M-poly-G peptide no uniform directional orientation can
            be observed.
            The structure with the best interface score (light blue) is oriendted
            towards the loop while the structure with the best total/reweighted
            (dark blue) is oriented towards the &beta;-sheets.
    Figure lpetgg
    <!-- das auch noch ändern --> shows the docking result of the LPETGG peptide to
    the sortase. The results shown are the best scoring structures of the clustering
    with respect to the total score, interface score and reweighted score. As the
    best scoring structure is the same for the total score and the reweighted score
    only two peptides are shown. This also applies to figures x and y. For both
    results the reacting glycin residues (yellow) are facing the active site.
    Additionally, the same residues are in close proximity to the active site.
    The figures x ad y show the docking of the both polyG and M-polyG. While polyG
    results align well and seem to be interacting with the &beta;6&#47;&beta;7 loop
    rather than with the active site, this does not seem to be the case for M-polyG.
    Instead of both structures interacting with the &beta;6&#47;&beta;7 loop or
    active site one (best interaction score; dark blue) interacts with the
    &beta;6&#47;&beta;7 loop and the other (best reweighted/total score; light
    blue-gray) appears to interact with the active site.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/7/76/T--TU_Darmstadt--dock_zoom_active.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            The close up of the M-polyG peptide (best
            total/reweighted score) indicates an interaction of methionine with
            arginine<sub>139</sub> and cysteine<sub>126</sub>.
<img class="img-fluid center"
        src="https://2019.igem.org/wiki/images/4/48/T--TU_Darmstadt--dock_zoom_loop.png" style="max-width:50%" />
        <div class="caption">
              Figure x :
            Methionine of the result with the best interface score
            interacted with the &beta;6&#47;&beta;7 loop rather than the active
            site. Still the reactive glycine residues appear to be bound to the
            &beta;6&#47;&beta;7 loop.
    As can be seen in figure 16 visualizing the result of the the docking simulation
    total/reweighted score) suggests an interaction of methionine and two of the
    active sites namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
    <!-- metionin erwähnen -->
    Visualizing the result of the according docking simulation, as can be seen in
    figure 16, suggests an interaction between methionine and two active site
    residues, namely arginine<sub>139</sub> and cysteine<sub>126</sub>.
    Figure 17 shows the interaction of M-polyG with the &beta;6&#47;&beta;7 loop.
    The glycines still interact with the &beta;6&#47;&beta;7 loop.
    Instead of binding above the &beta;6&#47;&beta;7 loop, which is the case for
    polyG as illustrated in fig z,
    the interaction seems to be influenced by methionine. By interacting with the
    residues in the &beta;-helix
    methionine could potentially hinder binding of glycine to the
    &beta;6&#47;&beta;7 loop by partial
    immobilization of the peptide. Overall peptide binding and orientation is less
    uniform compared
    polyG without the leading methionine, which could be an indicator of lesser
    binding affinity of M-PolyG towards
    the &beta;6&#47;&beta;7 loop.
    To computationally investigate binding affinities of the polyG and M-polyG as
    well as the LPETGG tags we performed
    docking simulations using the <i>Rosetta FlexPepDock</i> application. We used a
    modified version of the recommended
    protocol as the modified version was easier to automate and served our purpose
    better than the standard protocol.
    From the calculated scores only, we could not see a difference in binding
    Thus, we inspected the best scoring
    structures regarding the total score, the interface score and the reweighted
    score using PyMOL.
    Since the best structures with respect to total score and reweighted score were
    the same for all simulations,
    only two structures have been inspected per run. A polyproline tag was appended
    to all the peptides to simulate
    the modification of the VLPs with a small peptide.
    <!-- GRoß helices etc erwähnen als begründung -->
    As expected, the results showed that for LPETGG, the glycines of both peptides
    oriented towards the active site.
    This is unsurprising as peptides with the sequence LPXTGG are known to be
    substrate of the Sortase. It was more surprising to
    see the polyG tag oriented away from the active site since polyG also is a known
    substrate of the sortase. Both polyG peptides
    were facing the &beta;6&#47;&beta;7 loop (residues 105 to 115) uniformly and
    appeared to be interacting with it. The M-polyG peptides did not
    show a uniform orientation or interaction scheme. On one hand the visualization
    of the best result concerning the total and reweighted
    score has shown interaction of methionine with the cysteine<sub>126</sub> and
    arginine<sub>139</sub>, two residues of the active
    site. On the other hand, the visualization of the best result with respect to
    the interface score shows the M-polyG facing the mobile &beta;6&#47;&beta;7
    In contrast to the polyG peptide the lacking the methionine, the M-polyG peptide
    is pulled down below the &beta;6&#47;&beta;7 loop by the methionine interacting
    with one of the &beta;-sheets leading to the active site. This is not the case
    with the polgG results, which lie aligned in one plane
    with the &beta;6&#47;&beta;7 loop.
<h2>Modeling Conclusion</h2>
    generic filler text

Revision as of 03:32, 21 October 2019

ext install esbenp.prettier-vscode

Sortase A7M (Ca2+-independent variant)


Name Sortase A7M
Base pairs 450
Molecular weight 17.85 kDa
Origin Staphylococcus aureus, synthetic
Properties Ca2+-independent, transpeptidase, linking sorting motif LPXTG to poly-glycine Tag

Usage and Biology

generic filler text

Transpeptidase: Sortase

generic filler text


generic filler text

Sortase variants

generic filler text

Sortase A7M

generic filler text


generic filler text


generic filler text

Expression and purification

generic filler text


generic filler text

Flourescence Resonance Energy Transfer (FRET)

generic filler text

Mass Spectrometry

generic filler text


generic filler text

Characterization of Sortase A7M (and comparison to Sortase A5M)

generic filler text

How do we measure if our purified sortases are active?

generic filler text

how do we measure sortase reaction kinetics

generic filler text

Development of a new FRET pair

generic filler text

Why are enzyme-substrate ratio and duration important parameters of the sortase reaction?

generic filler text

Who wins - Sortase A7M or Sortase A5M

generic filler text

What about other substrates?

generic filler text

Primary Amines

generic filler text


generic filler text

Is Sortase A7M able to attach cargo to P22 coat protein?

generic filler text

Does methionine affect Sortase linking?

generic filler text

Are there other Sortases that might me useful?

generic filler text



In synthetic biology, theoretical models are often used to gain insights, predict and improve experiments. In our project we are modifying Virus-like particles (VLPs) by attaching proteins to the surface of the P22 capsid through a linker. The linking is catalyzed using the enzyme Sortase A7M, which is a calcium independent mutant of the wild type Sortase A from Staphylococcus aureus. We performed modeling to predict the unknown structure of the Sortase A7M, to improve the linker between proteins and therefore optimizing the modification efficiency of our platform.
Two different modeling approaches were used to determine the structure of Sortase A7M. We compared machine learning approaches to traditional comparative, Monte-Carlo based modeling methods. The results were evaluated using an energy-scoring function and molecular dynamics (MD) simulations. The most promising Sortase A7M structures were used to perform a docking simulation to screen for optimal linkers.

Structure determination

In silico modeling and simulation of proteins requires a 3D structure, which can be obtained from the RCSB Protein Data Bank. However, if no 3D structures are annotated, as it is the case with sortase A7M, the structure has to be determined by other means. The structure prediction of sortase A7M was done using two different approaches.



In our second approach we used the RosettaCommons comparative modeling (RosettaCM), which is based on homology modeling. Homology modeling is a protein modeling method, which requires one or more template structures as base the protein to be modeled on. The protein sequences are aligned with the sequence of the target protein. Unaligned sections are modeled using fragment or protein libraries, which leads to creating protein structures based on different sequence homologues of the protein of interest. Ab-initio or de novo modeling on the other hand attempts to find protein structures solely based on physicochemical principles applied to the primary sequence, which can be compared to the refolding of a denaturated protein.

RosettaCM combines ab-initio modeling with homology modeling. The homologus structures for which a resolved 3D structure with sufficiently similar sequence exists are generated using homology modeling. Afterwards the unaligned sequences are modeled de novo. By combining the two methods RosettaCM represents a precise and resource efficient tool for protein structure prediction. Rosetta applications rely on the Monte-Carlo Optimization, which is a probabilistic approach to finding a local minimum in the energy landscape of protein conformations. The underlying equation serving as the fundament of the statistical Monte-Carlo method is the Metropolis acceptance criterion:

where kB is the Boltzmann constant, ΔE the difference in energy of the two states and T the temperature. The term kBT can also be written as a single factor β.

During the statistical protein folding based on the Monte-Carlo method, the initial structure is changed by small random perturbations of the atom locations. Whether the structure is accepted or not is decided by the Metropolis acceptance criterion. If ΔE < 0, the structure is accepted, otherwise the newly proposed structure is accepted with probability p as described in the Metropolis acceptance criterion.


The RosettaCM protocol requires evolutionary related structures and sequences, as well as fragment files of the target structure. The fragment files serve as a structure template for the proteins and they consist of peptide fragments of sizes 3 and 9. We gathered five evolutionary related structures from the RCBS PDB with the accession numbers:

  • 1ija
  • 1itw
  • 1itp
  • 1ito
  • 2mlm

The five RCBS entries represent different structures of sortases from Staphylococcus aureus. Fragment files can be created with the Robetta online server or with the Rosetta FragmentPicker application.

The RosettaCM procedure is best described in the following steps:

  1. sequence and structural alignment of templates
  2. fragment insertion in unaligned sections
  3. replacement of random segment with segment from a different template structure
  4. energy minimization
  5. all-atom optimization

The alignment can be performed with various tools. We used MAFFT to generate the multiple sequence alignments. Prior to using the alignments as an input, they were converted to the grishin alignment format as RosettaCM requires the alignments to be in said format. The minimization is performed using the Rosetta controid energy function. For the centroid function to be applied, the protein is converted to the centroid representation. A protein in centroid representation consists of the backbone atoms N, Cα;, OCarbonyl and an atom of varying size representing the side chain. The advantage of using the centroid representation is that the energy landscape can be traversed easier due to the smoother nature of the centroid energy landscape. Finally the generated structure undergoes a second minimization in an all-atom model by means of Monte-Carlo optimization. This is similar to the energy minimization but without the amino acids being represented as centroids of their functional groups. Structures computed through all-atom optimizations can reach atomic resolutions {{Quelle rosetta paper}} which is crucial for a model meant to be used to estimate atomic interactions.


The run yielded 15,000 structures which have been compared using the Rosetta scoring functions (talaris2013). From the 15,000 structures generated, we inspected the ten best scoring structures.

As can be seen in figure 5, the most prominent differences can be found in the regions close to the N- and C-terminus. As fluctuations in those regions are not untypical, we decided to use the best scoring structure, candidate S_14771 (figure 6), as the input for the simulations to follow.

Figure x : The structural alignment of the ten best scoring sortase structures displaying minor differences with the exception of the C- and N-terminal regions. N- and C-terminal regions tend to show strong fluctuations, thus it is unsurprising to find the terminal regions to be unaligned

Figure x : Sortase A7M candidate S_14771 created through RosettaCM.

In order to evaluate the secondary structure of the Sortase A7M candidate S_14771 a Ramachandran plot has been created and compared to the five sortases used as input for the comparitive modeling. Comparisons were also drawn with the Sortase predicted by Deep Learning as well as a database of randomly sampled proteins. Ramachandran plots of dihedral angles (fig x) can be a first indicator whether the structures computed are valid.

Figure x : Caption?

Figure x : Caption?

Figure x : Caption?

Figure 5: The comparison of the ramachandran plot of structure S_14771 and the ramachandran plot found on Protopedia suggests that secondary structures are present. Hence the structure appears to contain α-helices, β-sheets and a small amount of lefthanded α-helices.


We used machine learning methods, as well as monte-carlo simulations to determine the structure of the mutated transpeptidase Sortase A7M. The machine learning approach using AlQuarishi's Deep Neural Network yielded a structure which seemed to not have any secondary structures. To exclude the possibility of an error in the PyMOL visualization software by Schroedinger, a Ramachandran plot (figure xyz) was created. The plot shows that no typical secondary structures are present which is a strong indicator of a failed approach to determine a structure. The approach, using Rosetta Comparative Modeling, yielded 15,000 structures scored with the talaris2013 scoring function. The ten best structures were aligned and exhibited almost identical secondary structures (figure xzy). The greatest structural differences are present in the N- and C-terminal regions. Since terminal regions tend to fluctuate more strongly than non-terminal segments of the protein, we deemed those fluctuations non-relevant for the proteins functionality.
Being the best scoring candidate, structure S_14771 was analyzed structurally using a Ramachandran plot (figure xyz). The plot shows all the relevant and typical structures sortases exhibits and serves as an indicator for a successful structure prediction.
In the steps to follow, a molecular dynamics (MD) simulation will be performed on both structures. Even though structure CASP12 does not seem to be a valid structure, refolding processes during a MD simulation might lead to a relaxation of the protein and allow for a promising prediction of the sortase A7M structure.

Molecular dynamics


The structure predictions made so far were based on statistical methods with physical constraints. The Deep Learning algorithm uses a neural network trained to find a function associating the amino acid sequence and the final 3D positions of the atoms within the protein. On the other hand, predictions were made with Rosetta using the Monte Carlo Method. Here random movement of individual atoms occurs, and the energy is estimated after each step.

Even though both methods use physical constraints to find plausible protein structures, neither of them actually simulates the behavior of these molecules within a physical force field. Moreover, both methods do not necessarily output fully relaxed protein structures and simulate water implicitly by preferring hydrophilic parts of the proteins to be on the outside. Thus, we conducted a molecular dynamics (MD) simulation to verify the plausibility of our protein structure and allow equilibration. The molecular dynamics simulation provides the opportunity to simulate water as discrete molecules, creating a solvated protein. This step is crucial to validate the structures, as the interaction with water is one of the primary mechamism for protein folding. Since neither candidate CASP12 nor S_14771 have been modeled with explicit water an according MD simulation is imperative, to verify the correctness of the candidates conformation. This of course is much more expensive in terms of computational ressources. As the protein has to be placed in a simulation box and said box is filled with water molecules. This is called solvation and is visualized for candidate S_14771 in figure eeeeee.

Figure x : Sortase A7M in a force field surrounded by discrete water molecules. Image was made with gmxSolvate.

We used GROMACS (GROningen MAchine for Chemical Simulations) as the tool for our molecular dynamic simulations. GROMACS solves Newtons equations of motion for individual atoms [1] . While this classical simulation is much more accurate than predictions made by the other methods, approximations are used nonetheless: Forces are cut after a certain radius and the system size is quite small. [1] Additionally, atoms are assumed to be classical particles, which is not the case, as quantum mechanics plays a role in particle-particle interactions. Still, this simulation is very computationally expensive. Therefore, only time periods less than one second could be simulated.


To perform the molecular dynamics simulations we mostly followed the GROMACS Lysosome tutorial as it serves our purpose perfectly. We created our simulation box to be of dodecahedral shape and a 0.7 nm distance of the solute to the box borders. We used periodic boundry conditions and a Na+ Cl- concentration of 0.012 mol/L. The main difference of our approach was that we used the CHARMM36 force field instead of the OPLS-AA/L force field and have adjusted our molecular dynamics parameters accordingly. The simulation was performed on a NVIDIA GTX 760 graphics card allowing us to simulate approximately 1 ns per hour.

To analyse the MD simulation we used the Python programming language and the Biotite package as well as GROMACS analysis tools as covar and anaeig. The first analyses are a root-mean-square deviation (RMSD), a root-mean-square fluctuation (RMSF) and a gyration radius analysis. RMSD calculations have been described in the structure prediction section. To compute the RMSF the movement distance of each residue is computed as a root-mean-square over time as:

Figure x : caption

where v(t)i is the position of atom i at time t. The radius of gyration is The final analysis performed on the MD simulation is called Principle Component Analysis (PCA). By applying PCA to a protein it is possible to gain insights into the relevant vibrational motions and thereby the physical mechanism of the protein .


The first possible indicators of a stable protein structure are converging root-mean-square deviation (RMSD), small root-mean-square fluctuation (RMSF) values as well as converging radii of gyration. Using the Python software package and the module Biotite we calculated these quantities and plotted the results for both candidate S_14771 and candidate CASP12.

Figure x : The RMSD is one of three main indicators of a stable protein structure of the MD simulation of S_14771 over the period of 200,000 ps. As time progressed the RMSD increased with a smaller slope. The value stabilizes at a time of 110,000 ps and fluctuated around the value of 6 Å.

Figure x : At t = 40,000 ps already the RMSD has arived at a stable value, while at the same time the gyration (fig x) radius decreases over time continuously. This information suggests the protein might be folding and potentially develpoing secondary structures not present previously.

Figure x : The prominent fluctuations of the residues from ranges 105 to 115 might indicate a binding site or another form of functional structure. The radius of gyration, just as the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps and converges towards a value of 16.7 Å.

Figure x : As from t = 40,000 ps the radius of gyration decreases constantly. At the end of the simulation the gyration radius reaches a value of 17 Å. This behavior indicates folding of the protein structure.

Figure x : The fluctuations (RMSF) of most residues appear insignificant compared to the first, the last residues and the residues close to residue 110 . Typically the N- and C-terminus tend to fluctuate more intensively due to the lack of stabilizing structures. The prominent fluctuations in the range of residue 105 to 115 can indicate a binding site or another form of functional structure.

Figure x : The prominent fluctuations of the residues from ranges 105 to 115 might indicate a binding site or another form of functional structure. The radius of gyration, just as the RMSD fig xyz, stabilizes around a simulation time of of 110,000 ps and converges towards a value of 16.7 Å.

Typical RMSDs and radii of gyration converge towards a value dependent on the size of the protein. Convergence of those quantities can be interpreted as a stable state of the protein structure. As it can be seen in Figures x and y both the RMSD and the radius of gyration stabilize at the same time as the simulation reaches 110,000 ps (110 ns), suggesting a now stabilized structure of candidate S_14771 solvated in water. Another indicator of a functional protein is the RMSF. Instead of being averaged over all atoms, the RMSF is averaged over time with respect to each amino acid. It provides insights in both protein stability and functionality. Fig xzf reveals the RMSF of residues 105 to 115 to be significantly higher than that of other residues. This hints at the presence of a functional unit along these residues. As commented on in the section describing our structure prediction approaches, the N- and C-terminal regions tend to fluctuate more strongly as a result of the absence of stabilizing structures.

RMSD and gyration of radius calculations of candidate CASP12 (figures x and y) provide evidence of folding. However, the RMSF values show values significantly higher, an effect possibly caused by instability or refolding. Nevertheless, the strongest fluctuations, disregarding the terminal regions, can be seen in the region of residue 105 to 115. This insight consolidates the theory that residues 105 to 115 might be a part of a functional unit.

We were unsure whether candidate CASP12 can be considered a plausible structure and how to interpret the findings concerning the prominent fluctuations. Therefore, we decided to perform a Principle Component Analysis.

Principle component analysis

To analyze our system further Principle Component Analysis (PCA) was performed using GROMACS.

Animation 4: A Principle Component Analysis of a fast (blue) and a slow (red) mode showing the most prominent movements of the Cα-chain of candidate S_14771. Both modes show movement of the β6/β7 loop consisting of residues 105 to 115 towards the active site . Thus we can assume that the closing β6/β7 loop is involved in the reaction mechanism.

The results from the Principle Component Analysis of candidate S_14771 (animation xy) show a movement of the residues 105 to 115 towards the active site, supporting our theory that residues 105 to 115 are important for the reaction mechanism. Since the slow mode (red), which shows the most relevant movement of the sortase, moves further towards the active site, it is possible that the β6/β7 loop either closes the binding site of the ligand peptides or even transports one peptide towards the other.

Animation xyz shows the results of the Principle Component Analysis of candidate CASP12. As the RMSF calculations suggested (fig xyz), the whole protein seems to be moving randomly with no directed movement. In addition the active site amino acids are spread across the protein confirming our assumption that the protein is not in a stable or plausible conformation.


We gained evidence that at least on of our Sortase A7M models is a valid and stable candidate by performing various methods to analyse the structural stability and validity of our two Sortase A7M candidates. The candidate S_14771 that was generated using RosettaCM appears to be a fitting candidate not only due to successful analyses, but also since the residues of the active site are close enough to each other to catalyze a ligation reaction. Our model created through deep learning excelled only in terms of RMSD and gyration radius calculations. Not only the RMSF and Principle Component Analysis but also the conformation of the active site have proven candidate CASP12 to be of no use for further calculations as it does not portray a valid conformation of Sortase A7M.


Now that the binding site of the Sortase had been found, the peptide ligand needed to be inserted into the binding site to create a peptide-protein complex. The procedure of choice for the introduction of a ligand into the binding site of a protein is called docking. In the following sections, we will present the protocol and methods we used as well as the results they yielded.


Enzymes are one of the most relevant macromolecules in biology. Their functionality is determined through the way they interact with their ligands. Although enzymes are highly specific concerning the ligands they interact with, similar compounds can often bind to the same enzyme albeit with different affinity. To determine the best possible binding conformation of the protein-ligand complex, we use FlexPepDock, an algorithm provided by the the RosettaCommons software package.


The ab-initio FlexPepDock protocol consists of multiple steps and is documented on the RosettaCommons online documentation. We modified the protocol as the one provided did not work with our approach. The modified protocol has the following form:

  1. secondary structure determination
  2. complex creation
  3. FlexPepDock refinement

To determine the secondary structure of the peptide, fragment files (3- and 5-mers) had to be generated and a PSIPRED secondary structure prediction had to be performed. As the peptides had a sequence length less than 20 amino acids, we were not able to use the online services such as Robetta and the PSIPRED online service. Instead we used the Rosetta FragmentPicker application and the PSIPRED command line tool. The generated structures serve as the input for the refinement protocol.
The generation of the peptide-protein complex can be divided into three steps:

  • peptide creation
  • peptide relaxation
  • coarse complex creation

The peptide structure was created through ab-initio modeling. Initial creation of the peptide was followed by insertion of the peptide into the sortase binding site. This lead to a coarse model of the peptide sortase complex. Here we used insight gained from the molecular dynamics simulation to place the peptide close to the binding site.
In the final step the FlexPepDock refinement protocol is executed and 50,000 complex structures are generated. We used the inputs as described in {{fuhrman paper}}, written by the authors of the FlexPepDock documentation.
To get a better overview over our data we performed a clustering in python, using the scikit-learn package. We clustered the structures with respect to:

  • total score: the total score of the docking provided by the Rosetta scoring function
  • interface score: the sum of the energy of the residues in the interfacing region
  • reweighted score: a score calculated by double weighting the contribution of the residues in the interfacing region
  • root-mean-square deviation: the root-mean-square deviation of the peptides in relation to the structure with the highest score
  • peptide direction: the direction the peptide is facing

Here clustering is used to group the docking results and thereby descrease the samlple size. From the 50,000 results we picked the results with the 500 best total scores, the 500 best interface scores and the 500 best reweighted scores. As we aimed to create an unbiased set for clustering, the abscence of duplicates in the set was ensured. We decreased the sample size to 100 groups representing the best scoring structures from the three categories.


For sequences MGGGGPPPPPP(M-polyG), GGGGPPPPPP(polyG) and PPPPPPLPETGG(LPETGG) 50,000 structures have been created and clustered. After the clustering the sample consisted of 100 structures of docked complexes.

Figure x : The three best scoring structures (total score, interface score, reweighted score) of the LPETGG-tag are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. The reacting section of the LPETGG-tag namely glycine is colored yellow as is the active site. The glycin of both ligand peptides is facing the active site.

Analysis of the scores has shown a similar score for all the three dockings. The best scoring results of the LPETGG docking show a tendency of the glycines to face the active site while also being in close proximity to the active site.

Figure x : The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Instead of facing the active site (yellow) the reacting glycines (yellow) appear to interact with the β6/β7 loop of the sortase.

Figure x : The three best scoring structures (total score, interface score, reweighted score) of the poly-g peptide are shown. Only two results are visible as the best reweighted score candidate is identical to the best interface score candidate. Concerning the M-poly-G peptide no uniform directional orientation can be observed. The structure with the best interface score (light blue) is oriendted towards the loop while the structure with the best total/reweighted (dark blue) is oriented towards the β-sheets.

Figure lpetgg shows the docking result of the LPETGG peptide to the sortase. The results shown are the best scoring structures of the clustering with respect to the total score, interface score and reweighted score. As the best scoring structure is the same for the total score and the reweighted score only two peptides are shown. This also applies to figures x and y. For both results the reacting glycin residues (yellow) are facing the active site. Additionally, the same residues are in close proximity to the active site.

The figures x ad y show the docking of the both polyG and M-polyG. While polyG results align well and seem to be interacting with the β6/β7 loop rather than with the active site, this does not seem to be the case for M-polyG. Instead of both structures interacting with the β6/β7 loop or active site one (best interaction score; dark blue) interacts with the β6/β7 loop and the other (best reweighted/total score; light blue-gray) appears to interact with the active site.

Figure x : The close up of the M-polyG peptide (best total/reweighted score) indicates an interaction of methionine with arginine139 and cysteine126.

Figure x : Methionine of the result with the best interface score interacted with the β6/β7 loop rather than the active site. Still the reactive glycine residues appear to be bound to the β6/β7 loop.

As can be seen in figure 16 visualizing the result of the the docking simulation total/reweighted score) suggests an interaction of methionine and two of the active sites namely arginine139 and cysteine126. Visualizing the result of the according docking simulation, as can be seen in figure 16, suggests an interaction between methionine and two active site residues, namely arginine139 and cysteine126. Figure 17 shows the interaction of M-polyG with the β6/β7 loop. The glycines still interact with the β6/β7 loop. Instead of binding above the β6/β7 loop, which is the case for polyG as illustrated in fig z, the interaction seems to be influenced by methionine. By interacting with the residues in the β-helix methionine could potentially hinder binding of glycine to the β6/β7 loop by partial immobilization of the peptide. Overall peptide binding and orientation is less uniform compared polyG without the leading methionine, which could be an indicator of lesser binding affinity of M-PolyG towards the β6/β7 loop.


To computationally investigate binding affinities of the polyG and M-polyG as well as the LPETGG tags we performed docking simulations using the Rosetta FlexPepDock application. We used a modified version of the recommended protocol as the modified version was easier to automate and served our purpose better than the standard protocol. From the calculated scores only, we could not see a difference in binding affinities. Thus, we inspected the best scoring structures regarding the total score, the interface score and the reweighted score using PyMOL. Since the best structures with respect to total score and reweighted score were the same for all simulations, only two structures have been inspected per run. A polyproline tag was appended to all the peptides to simulate the modification of the VLPs with a small peptide.

As expected, the results showed that for LPETGG, the glycines of both peptides oriented towards the active site. This is unsurprising as peptides with the sequence LPXTGG are known to be substrate of the Sortase. It was more surprising to see the polyG tag oriented away from the active site since polyG also is a known substrate of the sortase. Both polyG peptides were facing the β6/β7 loop (residues 105 to 115) uniformly and appeared to be interacting with it. The M-polyG peptides did not show a uniform orientation or interaction scheme. On one hand the visualization of the best result concerning the total and reweighted score has shown interaction of methionine with the cysteine126 and arginine139, two residues of the active site. On the other hand, the visualization of the best result with respect to the interface score shows the M-polyG facing the mobile β6/β7 loop. In contrast to the polyG peptide the lacking the methionine, the M-polyG peptide is pulled down below the β6/β7 loop by the methionine interacting with one of the β-sheets leading to the active site. This is not the case with the polgG results, which lie aligned in one plane with the β6/β7 loop.

Modeling Conclusion

generic filler text