Staple subunit: GCN4

GCN4 is a yeast transcription factor belonging to the bZip family of DNA-binding proteins. We used GCN4 to study DNA-binding kinetics in our "Mini staples" that bring two DNA target sites into proximity by binding them simultaneously.

Next to the well-studied linear DNA sequence, the 3D spatial organization of DNA plays a crucial role in gene regulation, cell fate, disease development and more. However, the tools to precisely manipulate this genomic architecture remain limited, rendering it challenging to explore the full potential of the 3D genome in synthetic biology. We - iGEM Team Heidelberg 2024 - have developed PICasSO, a powerful molecular toolbox based on various DNA-binding proteins to address this issue.

The PICasSO part collection offers a comprehensive, modular platform for precise manipulation and re-programming of DNA-DNA interactions using protein staples in living cells, enabling researchers to recreate natural 3D genomic interactions, such as enhancer hijacking, or to design entirely new spatial architectures for gene regulation. Specifically, the fusion of two DNA binding proteins enables to artifically bring distant genomic loci into proximty. To unlock the system's full potential, we introduce versatile chimeric CRISPR/Cas complexes, connected either on the protein or the guide RNA level. These1 complexes are reffered to as protein- or Cas staples. Beyond its versatility, PICasSO includes robust assay systems to support the engineering, optimization, and testing of new staples, ensuring functionality in vitro and in vivo. We took special care to include parts crucial for testing every step of the cycle (design, build, test, learn) when engineering new parts.

At its heart, the PICasSO part collection consists of three categories.
(i) Our DNA-binding proteins include our finalized enhancer hijacking Cas staple as well as half staples that can be used by scientists to compose entirely new Cas staples in the future. We also include our Simple staples that serve as controls for successful stapling and can be further engineered to create alternative, simpler and more compact staples.
(ii) As functional elements, we list additional parts that enhance the functionality of our Cas and Basic staples. These consist of protease-cleavable peptide linkers and inteins that allow condition-specific, dynamic stapling in vivo. Besides staple functionality, we also include the parts to enable the efficient delivery of PICasSO's constructs with our interkingdom conjugation system.
(iii) As the final category of our collection, we provide parts that support the use of our custom readout systems. These include components of our established FRET-based proximity assay system, enabling users to confirm accurate stapling. Additionally, we offer a complementary, application-oriented testing system for functional readouts via a luciferase reporter, which allows for straightforward experimental simulation of enhancer hijacking in mammalian cells.

The following table gives a comprehensive overview of all parts in our PICasSO toolbox. The highlighted parts showed exceptional performance as described on our iGEM wiki and can serve as a reference. The other parts in the collection are versatile building blocks designed to provide future iGEMers with the flexibility to engineer their own custom Cas staples, enabling further optimization and innovation.

Our part collection includes:

DNA-binding proteins: The building blocks for engineering of custom staples for DNA-DNA interactions with a modular system ensuring easy assembly.
BBa_K5237000	fgRNA Entry vector MbCas12a-SpCas9	Entryvector for simple fgRNA cloning via SapI
BBa_K5237001	Staple subunit: dMbCas12a-Nucleoplasmin NLS	Staple subunit that can be combined with sgRNA or fgRNA and dCas9 to form a functional staple
BBa_K5237002	Staple subunit: SV40 NLS-dSpCas9-SV40 NLS	Staple subunit that can be combined witha sgRNA or fgRNA and dCas12avto form a functional staple
BBa_K5237003	Cas Staple: SV40 NLS-dMbCas12a-dSpCas9-Nucleoplasmin NLS	Functional Cas staple that can be combined with sgRNA or fgRNA to bring two DNA strands into close proximity
BBa_K5237004	Staple subunit: Oct1-DBD	Staple subunit that can be combined to form a functional staple, for example with TetR. Can also be combined with a fluorescent protein as part of the FRET proximity assay
BBa_K5237005	Staple subunit: TetR	Staple subunit that can be combined to form a functional staple, for example with Oct1. Can also be combined with a fluorescent protein as part of the FRET proximity assay
BBa_K5237006	Simple staple: TetR-Oct1	Functional staple that can be used to bring two DNA strands in close proximity
BBa_K5237007	Staple subunit: GCN4	Staple subunit that can be combined to form a functional staple, for example with rGCN4
BBa_K5237008	Staple subunit: rGCN4	Staple subunit that can be combined to form a functional staple, for example with rGCN4
BBa_K5237009	Mini staple: bGCN4	Assembled staple with minimal size that can be further engineered
Functional elements: Protease-cleavable peptide linkers and inteins are used to control and modify staples for further optimization for custom applications
BBa_K5237010	Cathepsin B-cleavable Linker: GFLG	Cathepsin B-cleavable peptide linker that can be used to combine two staple subunits to make responsive staples
BBa_K5237011	Cathepsin B Expression Cassette	Expression Cassette for the overexpression of cathepsin B
BBa_K5237012	Caged NpuN Intein	A caged NpuN split intein fragment that undergoes protein trans-splicing after protease activation. Can be used to create functionalized staples units
BBa_K5237013	Caged NpuC Intein	A caged NpuC split intein fragment that undergoes protein trans-splicing after protease activation. Can be used to create functionalized staples units
BBa_K5237014	fgRNA processing casette	Processing casette to produce multiple fgRNAs from one transcript, that can be used for multiplexed 3D genome reprograming
BBa_K5237015	Intimin anti-EGFR Nanobody	Interkindom conjugation between bacteria and mammalian cells, as alternative delivery tool for large constructs
BBa_K4643003	incP origin of transfer	Origin of transfer that can be cloned into the plasmid vector and used for conjugation as a means of delivery
Readout Systems: FRET and enhancer recruitment to measure proximity of stapled DNA in bacterial and mammalian living cells enabling swift testing and easy development for new systems
BBa_K5237016	FRET-Donor: mNeonGreen-Oct1	FRET Donor-Fluorpohore fused to Oct1-DBD that binds to the Oct1 binding cassette. Can be used to visualize DNA-DNA proximity
BBa_K5237017	FRET-Acceptor: TetR-mScarlet-I	Acceptor part for the FRET assay binding the TetR binding cassette. Can be used to visualize DNA-DNA proximity
BBa_K5237018	Oct1 Binding Casette	DNA sequence containing 12 Oct1 binding motifs, compatible with various assays such as the FRET proximity assay
BBa_K5237019	TetR Binding Cassette	DNA sequence containing 12 Oct1 binding motifs, can be used for different assays such as the FRET proximity assay
BBa_K5237020	Cathepsin B-Cleavable Trans-Activator: NLS-Gal4-GFLG-VP64	Readout system that responds to protease activity. It was used to test cathepsin B-cleavable linker
BBa_K5237021	NLS-Gal4-VP64	Trans-activating enhancer, that can be used to simulate enhancer hijacking
BBa_K5237022	mCherry Expression Cassette: UAS, minimal Promotor, mCherry	Readout system for enhancer binding. It was used to test cathepsin B-cleavable linker
BBa_K5237023	Oct1 - 5x UAS binding casette	Oct1 and UAS binding cassette, that was used for the simulated enhancer hijacking assay
BBa_K5237024	TRE-minimal promoter- firefly luciferase	Contains Firefly luciferase controlled by a minimal promoter. It was used as a luminescence readout for simulated enhancer hijacking

1. Sequence overview

Sequence and Features

Assembly Compatibility:

10
COMPATIBLE WITH RFC[10]
12
COMPATIBLE WITH RFC[12]
21
COMPATIBLE WITH RFC[21]
23
COMPATIBLE WITH RFC[23]
25
COMPATIBLE WITH RFC[25]
1000
COMPATIBLE WITH RFC[1000]

2. Usage and Biology

GCN4 is a yeast transcription factor from the bZip family of DNA-binding proteins, first discovered by McKnight and co-workers in 1988. The bZip motif features a coiled-coil leucine zipper dimerization domain paired with a highly charged basic region that binds to DNA. GCN4 binds specifically to the cyclic AMP response element (CRE) DNA sequence (5' ATGACGTCAT 3') in the promoter regions of target genes, primarily through its basic residues at the N-terminus.

In our project we fused GCN4 to rGCN4 (BBa_K5237008) to create a 150 amino aci "Mini staple" that can bring two DNA target sites into close proximity.

The DNA-binding properties of GCN4 were tested using an electrophoretic mobility shift assay (EMSA) to quantify binding affinity and calculate kinetics. EMSA is a widely adopted method to study DNA-protein interactions. It works on the principle that nucleic acids bound to proteins exhibit reduced electrophoretic mobility compared to unbound nucleic acids (Hellman & Fried, 2007). EMSA can be employed both qualitatively, to assess DNA-binding capabilities, and quantitatively, to determine critical parameters such as binding stoichiometry and the apparent dissociation constant (K_d) (Fried, 1989).

3. Assembly and part evolution

The GCN4 amino acid sequence was taken from literature (Hollenbeck et al. 2001) and codon optimized for E. coli. A FLAG-tag (DYKDDDDK) was added to the N-terminus for protein purification. The FLAG-tag can be cleaved off using an Enterokinase, if necessary. The FLAG-GCN4 sequence was cloned into a T7 expression vector and expressed using E. coli BL21 (DE3) cells.

4. Results

4.1 Protein expression and purification

The FLAG-GCN4 protein could be readily expressed in E. coli. The protein was purified using an anti-FLAG resin. Fractions taken during purification were analyzed by SDS-PAGE and the protein concentration of the eluted protein determined with a lowry protein assay. A yield of 1.18 mg/mL was obtained, corresponding to 153 µM of monomeric FLAG-GCN4.

Figure 2: SDS-PAGE analysis of FLAG-GCN4 purification Fractions analysed for each protein are the raw lysate, flow through and eluate. Depicted is GCN4 (this part), rGCN4 (BBa_K5237008), and bGCN4 (BBa_K5237009). Protein size is indicated next to construct name and purified band with protein of interest highlighted by a red box.

4.2 Electrophoretic Mobility shift assay

Figure 3: Overview Image of Electrophoretic Mobility Shift Assay (EMSA)

The Electrophoretic mobility shift assay (EMSA) is a widely adopted method used to study DNA-protein interactions. EMSA functions on the basis that nucleic acids bound to proteins have reduced electrophoretic mobility, compared to their counterpart. (Hellman & Fried, 2007). Mobility-shift assays can both be used to qualitatively assess DNA binding capabilities or quantitatively to determine binding stoichiometry and kinetics such as the apparent dissociation constant (K_d) (Fried, 1989).

To analyze the binding DNA affinity an EMSA was performed, in which GCN4 was incubated in binding buffer with a 20 bp DNA probe containing the CRE GCN4 binding sequence (5' ATGACGTCAT 3') until equilibration. Subsequently the formed protein-DNA complexes were loaded on a native PAGE. Afterwards the DNA bands were stained with SYBR-safe.
To further analyze DNA binding, quantitative shift assays were performed for GCN4 and rGCN4 (BBa_K5237008). 0.5 µM DNA were incubated with varying concentrations of protein until equilibration. After electrophoresis, bands were stained with SYBR-Safe and quantified based on pixel intensity. The obtained values were fitted to equation 1, describing formation of a 2:1 protein-DNA complex:

Θ_app = Θ_min + (Θ_max - Θ_min) × (K_a² [L]_tot²) / (1 + K_a² [L]_tot²) Equation 1

Here [L]_tot describes the total protein monomer concentration, K_a corresponds to the apparent monomeric equilibration constant. The Θ_min/max values are the experimentally determined site saturation values (For this experiment 0 and 1 were chosen for min and max respectively).

Figure 4: Quantitative assessment of binding affinity for GCN4 and rGCN4. Proteins of different concentrations were incubated with 0.5 µM DNA until equilibrium and fraction bound analyzed, after gel electrophoresis, by dividing pixel intensity of bound fraction with pixel intensity of bound and unbound fraction. At least three separate measurements were conducted for each data point. Values are presented as mean +/- SD.

GCN4 binds to its optimal DNA binding motif with an apparent dissociation constant K_D of (0.293 ± 0.033) × 10^-6 M, which is almost identical to the rGCN4 dissociation constant to INVii a K_D of (0.298 ± 0.030) × 10^-6 M. Comparing them to literature values, our dissociation constants are approximately a factor 10 higher then those described in literature ((96) × 10^-8 M for GCN4 and (2.90.8) × 10^-8 M for rGCN4) (Hollenbeck et al., 2001). The differences could be due to the lower sensitivity of SYBR-Safe staining compared to radio-labeled oligos. Most likely, the protein concentration was miscalculated due to the presence of additional (lower intensity) bands in the SDS-PAGE analysis, indicating co-purification of small amounts of unspecific proteins.

The FLAG-tag fusion to the N-terminus of proteins could potentially decrease binding affinity, likely due to steric hindrance affecting the interaction with DNA. Interestingly, the differences in binding affinity between GCN4 and rGCN4 appear negligible. Since GCN4 binds to DNA via its N-terminus and rGCN4 binds C-terminally, the FLAG-tag likely does not directly influence DNA binding. However, it may influence the dimerization of the proteins, which is necessary for DNA binding. To further investigate this, the FLAG-tag can be cleaved using an enterokinase and potential changes in binding affinity analyzed. Furthermore, coiled coil formation, and the amount of dimeric and monomeric proteins could be further analyzed with circular dichroism spectroscopy (Greenfield, 2006).

4.3 In Silico Characterization using DaVinci

We developed DaVinci, an in silico model, for rapid engineering and optimization of our PiCasSO system. DaVinci serves as a digital twin to PiCasSO, analyzing the forces acting on the system, refining experimental parameters, and identifying optimal interactions between protein staples and target DNA. The model was calibrated using literature data and experimental affinity results from rGCN4 EMSA assays with purified proteins.
DaVinci operates in three phases: static structure prediction, all-atom dynamics simulation, and long-range DNA dynamics simulation. We applied the first two phases to our components, allowing us to characterize the structure and dynamics of the DNA-binding interactions.
For our bivalent DNA-binding Mini Staple (BBa_K5237009), consisting of GCN4 fused via a GSG-linker to rGCN4 (BBa_K5237008), we predicted the structure and binding affinity and tested various linker options. We evaluated the flexibility and rigidity of the constructs using pLDDT values from the predictions. Flexible linkers, like ('GGGGS')n, and rigid linkers, like ('EAAAK')n, were assessed (Arai et al., 2001). Predictions were colored by pLDDT scores, providing insights into chain rigidity (Akdel et al., 2022; Guo et al., 2022). Construct C (Fig. 5) was tested in the wet lab as part of BBa_K5237009, but it failed to bind DNA due to excessive rigidity, which inhibited subunit dimerization.

Figure 5: Variation of linkers connecting our mini staples. Panels A (BBa_K5237007) and B (BBa_K5237008) show orientations of the leucine zipper, each bound to DNA. Panels C to I display linker variations colored by their pLDDT confidence score, which serves as a surrogate for chain flexibility (Akdel et al., 2022). Note that panels H and I are not bound to the second DNA strand. All structures were predicted using the AlphaFold server (Google DeepMind, 2024).

5. References

Fried, M. G. (1989). Measurement of protein-DNA interaction parameters by electrophoresis mobility shift assay. ELECTROPHORESIS, 10(5-6), 366-376. https://doi.org/10.1002/elps.1150100515

Akdel, M., Pires, D. E. V., Pardo, E. P., Janes, J., Zalevsky, A. O., Meszaros, B., Bryant, P., Good, L. L., Laskowski, R. A., Pozzati, G., Shenoy, A., Zhu, W., Kundrotas, P., Serra, V. R., Rodrigues, C. H. M., Dunham, A. S., Burke, D., Borkakoti, N., Velankar, S., … Beltrao, P. (2022). A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol, 29(11), 1056–1067. https://doi.org/10.1038/s41594-022-00849-w

Arai, R., Ueda, H., Kitayama, A., Kamiya, N., & Nagamune, T. (2001). Design of the linkers which effectively separate domains of a bifunctional fusion protein. Protein Engineering, Design and Selection, 14(8), 529–532. https://doi.org/10.1093/protein/14.8.529

Chen, X., Zaro, J. L., & Shen, W.-C. (2013). Fusion protein linkers: Property, design and functionality. Advanced Drug Delivery Reviews, 65(10), 1357–1369. https://doi.org/10.1016/j.addr.2012.09.039

Google DeepMind. (2024). AlphaFold Server. https://alphafoldserver.com/terms

Greenfield, N. J. (2006). Using circular dichroism spectra to estimate protein secondary structure. Nature Protocols, 1(6), 2876–2890. https://doi.org/10.1038/nprot.2006.202

Guo, H.-B., Perminov, A., Bekele, S., Kedziora, G., Farajollahi, S., Varaljay, V., Hinkle, K., Molinero, V., Meister, K., Hung, C., Dennis, P., Kelley-Loughnane, N., & Berry, R. (2022). AlphaFold2 models indicate that protein sequence determines both structure and dynamics. Scientific Reports, 12(1), 10696. https://doi.org/10.1038/s41598-022-14382-9

Hellman, L. M., & Fried, M. G. (2007). Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions. Nature Protocols, 2(8), 1849-1861. https://doi.org/10.1038/nprot.2007.249

Hollenbeck, J. J., Gurnon, D. G., Fazio, G. C., Carlson, J. J., & Oakley, M. G. (2001). A GCN4 Variant with a C-Terminal Basic Region Binds to DNA with Wild-Type Affinity. Biochemistry, 40(46), 13833-13839.

Hollenbeck, J. J., & Oakley, M. G. (2000). GCN4 Binds with High Affinity to DNA Sequences Containing a Single Consensus Half-Site. Biochemistry, 39(21), 6380-6389. https://doi.org/10.1021/bi992705n

Part:BBa_K5237007