Part:BBa_K5088000

Protein tyrosine kinase - Promoter+5'UTR from T. kok-saghyz

Background

Motivation

**Figure 1:** Graphical abstract - from dandelion to natural rubber.

With our project Tarakate, we aim to explore the potential of the Russian dandelion (Taraxacum kok-saghyz) as a sustainable source of natural rubber. This plant, native to Kazakhstan, is unique for its ability to produce significant amounts of high-quality latex in its roots—a trait found in only a few species worldwide (1). Natural rubber is vital due to the global demand of roughly 15 million tons annually (2) and its application in more than 50,000 products (3), ranging from tires to medical supplies. As demand continues to rise, the limitations of traditional rubber sources, such as the rubber tree (Hevea brasiliensis), have become increasingly apparent. The production of rubber from H. brasiliensis has led to significant environmental and economic challenges, including deforestation of approximately four million hectares of rainforest, labor exploitation, and vulnerability to diseases like South American leaf blight (4, 5, 6). These issues, coupled with the geographic constraints of rubber tree cultivation—primarily restricted to tropical regions—underscore the urgent need for alternative rubber sources that can be cultivated in diverse climates and offer greater sustainability.

To harness the potential of T. kok-saghyz as a sustainable source of natural rubber, engineering priorities include optimizing biomass production and morphological traits, such as root architecture and seed size, to maximize yield, improve harvesting efficiency, and enhance overall agricultural practices for easier handling and processing. Besides targeting natural rubber production, there is also a focus on increasing the production of other valuable products like inulin (7), which can be used for the production of biofuel, and various bioactive compounds with potential applications in multiple industries (8). By developing alternative rubber sources like T. kok-saghyz, we aim to mitigate the environmental and economic impacts associated with traditional rubber production and contribute to a more sustainable future.

Lack of Endogenous Regulatory Parts

To successfully implement these engineering efforts, it is essential to have a set of well-characterized regulatory parts, including promoters, 5’ untranslated regions (UTRs), 3’ UTRs, and other genetic sequences that provide precise control over gene expression. These standardized components are fundamental for achieving reliable and predictable outcomes in any synthetic biology project.

With this in mind, we set out to evaluate the current repertoire of regulatory parts used in T. kok-saghyz (TKS). Our review revealed that most constructs developed for TKS rely heavily on a limited selection of regulatory elements, such as the Cauliflower mosaic virus 35S (CaMV 35S) promoter and the nopaline synthase (NOS) terminator.

Part Type	Part Name	Origin	References
Promoter + 5’UTR	35S	Cauliflower mosaic virus	(9, 10, 11, 12, 13, 14)
Promoter + 5’UTR	NOS	Agrobacterium tumefaciens	(9)
Promoter + 5’UTR	PEP16	Hevea brasiliensis	(15)
Promoter + 5’UTR	Ubiquitin4-2	Petroselinum crispum	(16)
Promoter	UBQ1	Arabidopsis thaliana	(10)
Promoter	U6-26	Arabidopsis thaliana	(10)
Promoter	CPTL1	Taraxacum kok-saghyz	(17)
3'UTR	35S	Cauliflower mosaic virus	(9, 10, 11, 12, 13, 14)
3'UTR	NOS	Agrobacterium tumefaciens	(9)
3'UTR	UBQ1	Arabidopssi thaliana	(10)
3'UTR	3A	Pisum sativum	(16)

Table 1: Regulatory elements identified in T. kok-saghyz literature.

To overcome this challenge, our project this year focused on developing a broader collection of regulatory parts specifically tailored for TKS. By increasing the diversity of genetic components, we aim to support more ambitious synthetic biology projects in this promising plant.

Political & Regulatory Aspect

To ensure that engineered plants can be utilized, it is crucial that they adhere to the regulatory standards of the countries in which they will be deployed. In the European Union (EU), regulations surrounding genetically modified organisms (GMOs) are particularly strict and have sparked intense debate. However, a relaxation of these stringent regulations may be on the horizon.
The EU Parliament recently adopted its position for negotiations with member states regarding the proposal on New Genomic Techniques (NGTs). Currently, all plants developed using NGTs are regulated under the same framework as GMOs. However, Members of the European Parliament support the creation of two distinct categories with corresponding regulatory frameworks for NGT plants. NGT plants deemed equivalent to conventionally bred plants (NGT 1) would be exempt from the GMO legislation, while NGT plants that do not meet this criterion (NGT 2) would still be subject to more stringent regulations.

If this proposal is enacted into European law, engineered plants intended for use within the EU must meet the criteria for NGT 1 status. To achieve this, it is crucial to prioritize the use of endogenous regulatory elements over synthetic ones to fulfill the strict requirements outlined in the proposal.

Plant Synthetic Biology

Gene Structure in Plants

Gene structure in plants, like in other eukaryotes, is characterized by a complex organization of coding and non-coding regions. Unlike the typically continuous coding sequences found in prokaryotes, plant genes consist of exons—both protein-coding and non-coding—separated by introns, which are removed during RNA splicing to produce mature mRNA.

Untranslated regions (UTRs) at both ends of the mRNA also play crucial roles. The 5' UTR influences translation initiation, while the 3' UTR affects mRNA stability and lifespan. These regions enable plants to finely tune protein production in response to environmental and developmental cues.

Promoters, located upstream of the coding sequence, are critical for initiating transcription. These regions contain specific sequences, that regulate when, where, and how much a gene is expressed. The terminator region downstream ensures proper transcription termination and mRNA processing.

In contrast to prokaryotes, where genes are often organized in operons and transcribed as polycistronic mRNA, plant genes are transcribed independently. This complexity, including the roles of introns, UTRs, and promoters, is essential for regulating gene activity in processes like development, differentiation, and environmental response.

**Figure 2:** Schematic representation of gene expression in plants, illustrating the process from transcription to translation. RNA Polymerase II transcribes the DNA into a primary RNA transcript, which then undergoes processing, including 5' capping, 3' end processing, and polyadenylation. The pre-mRNA is spliced to remove introns and produce mature mRNA, which is exported from the nucleus to the cytoplasm. In the cytoplasm, the mRNA is translated into protein and directed to its subcellular location.

Transcription in Plants

In plants, gene activity is regulated at multiple levels, with transcriptional control being the primary and most studied mode of gene expression regulation. Central to this regulatory landscape are promoter architecture, cis-regulatory elements, and enhancer organization.

The promoter is a region of DNA located upstream of the transcription start site (TSS) and serves as the primary site for transcriptional regulation. The promoter architecture is characterized by a complex arrangement of core promoter elements, proximal promoter regions, and distal regulatory sequences, each playing a distinct role in transcription regulation.

**Figure 3:** Schematic representation of the gene regulatory region highlighting the promoter structure. The promoter spans approximately 2 kb and is divided into three main regions: the distal promoter, which may contain enhancer elements; the proximal promoter, which includes transcription factor binding sites; and the core promoter, which is essential for the initiation of transcription. The transcription start site is indicated by the arrow, with the downstream region representing the gene structure, including exons (yellow blocks) and additional regulatory elements.

The core promoter typically spans the region immediately upstream of the TSS and includes elements such as the TATA box, the Initiator (Inr) element, and the downstream promoter element (DPE). The TATA box is often found at a conserved position around 25-30 base pairs upstream of the TSS and is recognized by the TATA-binding protein (TBP), a subunit of the transcription factor IID (TFIID). The Inr element, located at the TSS, can act independently or in conjunction with the TATA box to facilitate the assembly of the pre-initiation complex (PIC). These core elements are crucial for the precise recruitment of RNA polymerase II and the initiation of transcription.

Beyond the core promoter, the proximal promoter region extends up to a few hundred base pairs upstream and contains binding sites for various transcription factors (TFs). These TFs interact with the basal transcription machinery, modulating the frequency and efficiency of transcription initiation. The binding of TFs to these sites can either activate or repress transcription, depending on the nature of the TFs and the context of their binding.

Distal regions, which may be located thousands of base pairs away from the TSS, also contribute to promoter activity. These regions often contain enhancers, silencers, or insulators, which can exert their regulatory effects through structural interaction with the promoter region, bringing TFs and co-regulators into close proximity with the PIC.

Cis-regulatory elements (CREs) are short, non-coding DNA sequences within promoters that serve as binding sites for transcription factors and other regulatory proteins. CREs are pivotal in defining the specificity of gene expression patterns, enabling plants to finely tune their transcriptional responses.

The specificity and complexity of plant transcription are largely dictated by the combinatorial interactions between TFs and their corresponding CREs. These TFs can act as activators or repressors, depending on the context. For instance, light-responsive elements (LREs) within the promoters of photosynthesis-related genes bind to TFs activated by light, thus ensuring that these genes are expressed in response to light stimuli.

Additionally, stress-responsive elements like the dehydration-responsive element (DRE) and heat shock element (HSE) play crucial roles in enabling plants to rapidly activate stress-related genes when facing environmental challenges. These various layers of regulation highlight the intricate mechanisms of plants to fine-tune gene expression in response to both internal and external signals.

**Figure 4:** Diversity of Enhancer Organization Models. Shown are a variety of enhancer organization models, showcasing the different ways transcription factors (TFs) can interact with cis-regulatory elements (CREs) on DNA. Alternate Complex Formation - Transcription factors can function both as transcriptional activators and repressors, depending on the cofactors. Transcription Factor Collective - TFs bind cooperatively, using both DNA and proteins as scaffolds. Enhanceosome - TFs bind to CREs in a precise order and orientation. Billboard Model - CREs maintain their composition and position, with regulatory output varying based on TF expression and activity.

Post-Transcriptional Regulation

In plants, post-transcriptional modification is essential for fine-tuning gene expression and enabling adaptive responses to environmental changes. One prominent example is alternative splicing, where a single gene generates multiple mRNA isoforms, each capable of being translated into different protein variants. This process significantly expands proteome diversity, offering plants the flexibility to adapt to various conditions. Additionally, the stability and decay of mRNA, influenced by RNA-binding proteins and microRNAs, further contribute to the regulation of gene expression at the post-transcriptional level.

Translation in Plants

After transcription, a precursor mRNA (pre-mRNA) is synthesized and undergoes splicing, 5' capping, and polyadenylation to form mature mRNA, which is then translated into protein by the ribosome.

The 5' untranslated region (UTR), located between the transcription start site (TSS) and the coding sequence (CDS), plays a key role in regulating gene expression. Although it doesn't code for proteins, the 5' UTR influences translation initiation through elements like upstream open reading frames (uORFs), internal ribosome entry sites (IRES), and secondary structures that either enhance or repress translation.

Beyond translation, the 5' UTR also affects mRNA stability, crucial for determining gene expression levels. Its interactions with RNA-binding proteins or microRNAs (miRNAs) can stabilize or degrade mRNA, thus controlling protein production.

In plants, the 5' UTR can respond to environmental signals, with alternative transcription start sites generating mRNA isoforms of varying lengths. These variations impact translation efficiency, mRNA localization, or stability, adding complexity to gene expression regulation.

====Plant SynBio Standards====

Just as in other engineering disciplines, standardization has been crucial to the growth and innovation in synthetic biology. This is particularly evident in iGEM, where standardization of biological parts is a key aspect, enabling teams to share and build upon each other's work.

In the context of plant synthetic biology, the importance of standardization is amplified due to the inherent complexity of engineering multicellular eukaryotes. The introduction of BioBricks standards marked a significant milestone in microbial engineering, and similar principles are now being applied to plant systems to streamline and enhance their development.

A major advancement in this area is the adoption of the Phytobricks standard, which provides a framework for the standardization of DNA parts used in assembling eukaryotic transcriptional units. This standard is rooted in the widely recognized Golden Gate cloning method, which allows for the efficient and precise assembly of multiple DNA components in a single reaction.

**Figure 5:** Schematic representation of plant genetic parts designed according to the PhytoBrick standard, recommended for iGEM submission under RFC106. The PhytoBrick standard allows modular assembly of genetic constructs, including Promoter, 5' UTR, Coding Sequence (CDS) and 3' UTR. The figure shows key components and different layout options, emphasizing the standardization and flexibility in designing plant synthetic biology parts.

The Phytobricks standard introduces a common syntax, consisting of 12 fusion sites, which facilitates the consistent assembly of genetic parts across different projects and laboratories. This standardization not only simplifies the DNA assembly process but also fosters collaboration within the scientific community, as it enables researchers to build on each other's work using a shared set of tools and protocols.

Furthermore, the Phytobricks standard is designed to be compatible with other widely adopted systems, such as GoldenBraid2.0 and MoClo, which are extensively utilized in plant research. This compatibility ensures that parts created using the Phytobricks standard can be seamlessly integrated into existing projects and work alongside other standardized components. This integration enhances the modularity and scalability of synthetic biology applications in plants, making it easier to develop complex biological systems in a systematic and efficient manner.

Design

In plant bioengineering, the precise modulation of transgene expression is essential for a variety of applications, including the construction of gene circuits, plant metabolic engineering, and enhancing the efficacy of gene editing efforts. Achieving this control is critical for optimizing gene function and ensuring successful outcomes in these complex engineering tasks. One of the most effective approaches to modulating gene expression has been the development of part toolboxes and libraries.

A notable example of the impact of these regulatory elements is seen in Escherichia coli, where the early development of the Anderson promoter library has played a pivotal role. These promoters have been widely adopted in science, particularly in iGEM, where teams have utilized these parts to improve their experimental designs and metabolic engineering efforts. The success of these promoters in microbial systems highlights the potential for similar approaches in plant systems. However, current efforts in plant bioengineering are hindered by the reliance on a limited set of strong promoters. This limitation restricts the ability to undertake more nuanced and refined genetic engineering endeavors in planta, where a broader range of transcriptional control is necessary.

To overcome this limitation, our strategy focused on characterizing a suite of constitutive promoters for dandelion that provide a broad spectrum of transcriptional levels. The goal was to create a versatile toolkit in the Phytobricks standard, that would allow for precise control over gene expression in T. kok-saghyz. This toolkit was designed not only to include promoters that span a wide dynamic range, but also to integrate other critical regulatory elements, such as 3’ UTRs.

Design Strategy

To achieve our goal of developing a versatile toolbox for T. kok-saghyz, we utilized a publicly available RNA-seq dataset, covering a wide array of tissues and developmental stages.

We drew inspiration from the methodology described in the study “A Suite of Constitutive Promoters for Tuning Gene Expression in Plants” by Zhou et al. (18), which effectively used RNA-seq data to develop a promoter library for Arabidopsis thaliana. Our approach adapted and expanded on this strategy, tailoring it to T. kok-saghyz. The foundation of our promoter discovery pipeline was the integration of transcriptomics data into a computational framework designed to pinpoint promoters with stable and consistent expression patterns across different tissues and developmental stages.

A critical aspect of this approach was the use of statistical analysis, particularly focusing on the mean and variance of gene expression across these various samples. High mean expression levels were indicative of promoters capable of driving strong gene expression, making them prime candidates for inclusion in our toolbox. However, equally important was the ubiquitousness of this expression, as measured by low variance. Promoters with low variance across samples are more likely to provide reliable and consistent expression across different tissues and developmental stages, which is essential for their utility in a wide range of applications. The result was a targeted set of candidate promoters with the desired characteristics for our toolbox.

Once the candidate genes were identified, we extracted their upstream regulatory regions, covering up to 1.8-2.2 kilobases before the start codon. These regions are expected to include the core promoter elements and essential cis-regulatory sequences responsible for the observed expression patterns. By focusing on these upstream regions, we aimed to capture the critical elements that drive stable and strong gene expression in T. kok-saghyz.

Transcriptomics

Our analysis was built on a published chromosomal-level genome assembly of T. kok-saghyz, which provided basic annotations for coding sequences (CDS) and untranslated regions (UTRs) (GWH:19732)(19).

To put our transcriptomic analysis into practice, we conducted an extensive search for publicly available RNA-seq data, eventually identifying a suitable dataset within the Genome Sequencing Archive from the China National Center for Bioinformation (GSA:CRA003851)(20).

**Figure 6:** Illustration of plant developmental stages and tissues used in the RNA-seq study. The left side shows growth at 1.5 and 3 months, while the right side shows a 7-month-old plant. Sampled tissues include leaf, lateral root, main root, and stem for the 1.5- and 3-month-old plants, and flowers, laticifer, peduncle, and seeds for the 7-month-old plants. Figure was adapted from (20).

The selected dataset comprised RNA-seq data from various tissue samples collected at different developmental stages. Specifically, the dataset included tissue samples from flowers, laticifer, peduncle, and seeds of seven-month-old plants. Additionally, it contained samples from leaf, lateral root, main root, and stem tissues of 1.5-month-old plants. These tissues were sampled again after three months, providing a comprehensive dataset of 12 biological samples, each with three technical replicates.

Before proceeding with the downstream analyses, we assessed the quality of the raw RNA-seq reads using FastQC, a widely adopted tool for quality control in sequencing projects. FastQC provided insights into various quality metrics, such as per-base sequence quality, sequence duplication levels, and adapter content. After ensuring that the quality of the RNA-seq runs was suitable for our research objectives, we moved forward with the transcriptomic analysis.

Given the critical importance of accurate transcript quantification, we explored several transcriptomics pipelines, categorized into two main approaches: alignment-based and alignment-free methods.

We initially investigated a custom pipeline based on the popular STAR aligner. However, after thorough testing, we became concerned that the reliance on annotated splicing sites might introduce inaccuracies in quantification, particularly when the annotations were incomplete or inaccurate.

Given our concerns with alignment-based approaches, we turned to alignment-free methods, including Kallisto, Sailfish, and Salmon. These methods do not rely on predefined annotated splicing sites but instead operate directly on the transcriptome, which was assembled from the RNA-seq data presented in the study. Since alignment-free methods bypass the need for exact splicing site annotations, they are better suited to our approach, providing more accurate quantification of transcript abundance, particularly for novel or unannotated transcripts that may not be well-represented in the genome annotation.

Moreover, alignment-free methods offer practical advantages by being significantly faster and less computationally demanding than alignment-based methods, making them more accessible to iGEM teams.

**Figure 7:** Schematic overview of the RNA-seq differential expression analysis pipeline we constructed. The figure illustrates both the alignment-based method that was investigated and the alignment-free pipeline that we ultimately chose for the analysis. The alignment-based approach involves steps such as splice-aware alignment using STAR, while the alignment-free method utilizes Salmon for generating pseudocounts directly from the reference transcriptome. Both pipelines include steps for quality control, feature counting, and statistical analysis of differential gene expression using DESeq2.

Given these findings, we selected the alignment-free method using Salmon for our primary transcriptomic analysis. Salmon was particularly advantageous as it directly produced the pseudocounts necessary for differential gene expression (DE) analysis.

Following the quantification of transcript abundance, we performed additional quality control measures, which were aggregated using MultiQC to obtain a comprehensive overview of our data quality and analytical workflow.

The pseudocount data generated by Salmon was subsequently imported into DESeq2 via tximport for DE analysis.

To select for promising gene candidates, we applied regularized log (rlog) and variance stabilizing transformations (VST) to the data. Afterward, we performed quantile filtering, selecting genes with mean expression levels in the top 95th percentile and expression variance below 30%. This stringent filtering process identified a subset of enriched genes, which were then arranged by their coefficient of variation (variance/mean). This process resulted in a final list of 764 unique genes for further analysis.

To narrow down our list of potential candidates, we further analyzed the selected genes for unwanted restriction sites upstream and downstream of their coding regions. After this refinement, we were left with a few hundred candidates, from which we selected 20 genes of interest, each with differing coefficients of variation, for downstream evaluation and incorporation into our genetic toolbox.

Part Identification, Context & Function

After identifying the potential candidates, we wanted to investigate the function of the associated proteins. This step was essential for assessing the accuracy of our method by determining whether our filtered selection contained known housekeeping genes, which are consistently expressed across tissue samples.

To achieve this, the transcripts were mapped to the annotated genes within the chromosomal assembly of T. kok-saghyz. Even though the coding sequences were annotated, they were limited to chromosomal-based nomenclature, comparable to those in A. thaliana and S. cerevisiae, lacking additional details such as the name of the proteins they encode for.

To gain deeper insights into the genes under study, we explored several annotation tools, ultimately choosing eggNOG-mapper. eggNOG is a bioinformatics pipeline and database that provides orthology assignments and functional annotations for a wide range of organisms, including plants. It identifies orthologous groups within plant species, allowing us to infer functional information from similar genes.

Using this approach, we obtained annotations for approximately 35,000 proteins, covering the majority of known genes in T. kok-saghyz.

In addition to identifying the associated proteins, the pipeline also assigned functional Gene Ontology (GO) terms and KEGG pathways to these genes.

To further analyze their biological significance, we examined both the genetic context and the functions of the associated proteins.

Identification

Following the filtering that resulted in the selection of a few hundred unique candidate genes, we proceeded to conduct a more focused analysis to identify genes with desirable characteristics. One such gene that stood out was later identified as PTI1, a gene that demonstrated a combination of low variance and moderate mean expression across various tissue types and developmental stages.

**Figure 8:** Gene expression profile of all identified gene transcripts across 12 samples (with 3 biological replicates each), representing different tissue types and developmental stages. The y-axis displays the variance between samples, calculated using Variance Stabilizing Transformation (VST), while the x-axis shows the mean transcript expression in a regularized log scale. Dashed lines indicate the quantile filtering thresholds applied for both variance and mean expression. The purple-shaded region highlights the area of interest, containing 764 genes that were selected for further analysis. The gene PTI1, corresponding to this regulatory part, is highlighted within this region.

To better understand the potential regulatory utility of PTI1, we focused on its specific expression pattern across various tissues and developmental stages. Although PTI1 is not among the highest-expressing genes, it exhibits remarkably low variance, which is a crucial feature for Plant SynBio, where predictable and stable gene expression is often more valuable than sheer expression strength.

Figure 9: Normalized counts of PTI1 across various tissue types and developmental stages using DESeq2’s median of ratios normalization method. Each bar shows the mean normalized count for PTI1 across the samples, with error bars indicating the standard deviation. The dots represent individual biological replicates for each sample.

To quantify PTI1's expression stability more precisely, we summarized key statistical measures derived from both the VST and regularized log transformations.

Genetic Context and Gene Structure

**Figure 10:** Genomic map showing the distribution of selected genes across nine chromosomes (Chr 1 to Chr 9). The scale on the left represents chromosome lengths in megabases (Mb). The positions of the genes are indicated along the chromosomes, with the gene PTI1 emphasized in bold.

Understanding the genomic context of genes provides insights into their potential regulatory mechanisms and interactions with neighboring genes, which can be critical for their function. The genomic map in Figure X illustrates the distribution of our selected genes across the nine chromosomes of T. kok-saghyz.

Corresponding Protein and Function

**Figure 11:** Hierarchical overview of Gene Ontology (GO) terms associated with the PTI1 protein, as predicted by eggNOG during the annotation process. The GO terms are organized into categories representing Molecular Function, Biological Process, and Cellular Component, showing the relationships and hierarchies between them. This visualization was generated using the EBI's QuickGO tool and provides a detailed breakdown of the functional annotations associated with the PTI1 gene.

The hierarchical GO view included more detail than needed, making it difficult to digest. Inspired by the GO ribbon on UniProt, which offers a simplified summary of gene functions, we sought a similar visualization for our gene. However, no tools were available to create such visualizations for non-model organisms.

To address this, we developed our own tool to map the GO terms to the 'plant slimset,' a condensed version of GO terms specifically tailored to highlight key functional categories in plants. This approach enabled us to create a ribbon diagram summarizing the gene's GO annotations across Biological Process, Molecular Function, and Cellular Component.

GO ribbon for PTI1 — **Figure 12:** Ribbon diagram summarizing GO annotations for PTI1. The diagram is divided into three sections—Biological Process, Molecular Function, and Cellular Component—each representing high-level GO terms. The color gradient from white to green indicates the number of corresponding annotations, with deeper shades representing higher values.

Testing & Measurement

Dual Fluorescence Reporter Assay

Experiments in plant biology are often susceptible to high variability, with factors like transformation efficiency, cell viability, and environmental conditions contributing to noisy results.

To mitigate these issues, we employed a dual fluorescence reporter assay. In this setup, both the target and reference proteins are expressed from the same plasmid, allowing for precise and reliable characterization.

Using a second, constant fluorescence reporter as an internal reference allows us to normalize the readout, ensuring that observed fluorescence accurately reflect the activity of the regulatory elements under study, independent of external factors.

**Figure 13:** SBOL scheme of our Tarakate - consensus measurement construct [BBa_K5088677].

Choice of Reporter Genes

For our initial round of testing, we evaluated a construct design using the plant GFP from the iGEM distribution kit as a reporter system.

In our initial experiments, we were unable to detect any fluorescence in dandelion leaf infiltrations or protoplast transformations. To troubleshoot, we tested the construct in tobacco leaf infiltrations. This process required several rounds of protocol optimization before we finally achieved positive GFP expression in tobacco leaves. After further adjustments, we also obtained very low GFP signals in dandelion protoplasts.

Despite these optimizations, the signal strength remained weak, registering only about three times above background levels. This indicated that the system was still not sensitive enough for effective promoter characterization. The avGFP from the iGEM distribution kit, an older and less bright variant, likely contributed to the low signal observed.

To address this, we switched to a more advanced GFP variant, eGFP, known for its higher brightness. Moreover we did extensive literature research and found out that Agrobacterium is able to use some plant promoters, leading to fluorescence that would be indistinguishable from the plant tissue's expression.

Therefore we decided to introduce the potato ST-LS1 intron into the eGFP coding sequence to prevent Agrobacterium from expressing GFP, as this has been described as a suitable strategy for that problem.

A further challenge we encountered was the variability in transformation efficiency during Agrobacterium-mediated leaf infiltration and protoplast transformation. This variability made it difficult to perform quantitative measurements of the individual regulatory parts across biological replicates. To overcome this issue, we adopted the previously reported ratiometric approach by incorporating a second reporter gene in each construct. This reference reporter is located on the same plasmid as the GFP to minimize noise and provide a reliable normalization standard for more accurate quantitative analysis. To read more about the optimization of our reporter construct visit our engineering page

Transient Transformation

The characterization of genetic parts is crucial for understanding their functionality and behavior within a host organism. However, this process presents significant challenges, particularly when working with plants, due to the time-intensive nature of stable transformation. Stable transformation involves the integration of foreign DNA into the host genome, ensuring that the genetic modifications are inherited by subsequent generations. While this approach is crucial for characterizing the parts in their intended context, it also comes with certain drawbacks.

The process of developing stable transgenic lines is time-consuming and resource-intensive due to the need for multiple homozygous lines to ensure consistent and detectable transgene expression, influenced by local genetic context. This can be especially challenging for iGEM projects. To address these challenges, transient transformation offers a faster and more efficient alternative for the rapid characterization of genetic parts.

Transient transformation enables temporary expression of a reporter construct without the need for stable integration into the host genome, significantly reducing the time required to obtain results and allowing for faster iteration and optimization of genetic constructs. Techniques such as leaf infiltration and protoplast assays are already standard in plant biology when working with model organisms like Nicotiana benthamiana. We opted to use these methods in Taraxacum and optimize these methods in order to test genetic constructs in an easier fashion.

Leaf Infiltration

Leaf infiltration is a widely used method in plant biology for the transient introduction of foreign DNA into plant cells. This technique has been effectively employed by various iGEM teams to rapidly test genetic constructs in plants.

The method utilizes the natural ability of Agrobacterium to transfer genetic material into plant cells. During the process, plant leaves are infiltrated with a suspension of Agrobacterium carrying a plasmid containing the desired genetic construct. The bacteria then transfer the DNA into the plant cells, where it is expressed temporarily without integrating into the plant genome.

As a transient expression system, however, leaf infiltration can be susceptible to gene silencing, particularly when the introduced genes are highly expressed. To address this issue, the P19 protein was introduced using a separate plasmid in a different Agrobacterium strain (GV3101) to suppress gene silencing, alongside the reporter gene. P19 functions as an RNA silencing suppressor, helping to maintain stable and high levels of transient gene expression.

While this strategy helps reduce the effects of gene silencing, the use of P19 also raises some concerns. Literature suggests that P19 can artificially enhance the expression of proteins, such as GFP, potentially confounding experimental results.

Protoplast

Measurement Setup

Results

The Dandelion Toolbox

Overview

Part Identifier	Part Type	Nickname	Part Description
BBa_K5088000	Promoter + 5'UTR	P_PTI1	Protein tyrosine kinase - Promoter+5'UTR from T. kok-saghyz
BBa_K5088001	Promoter + 5'UTR	P_RPL28	Large subunit ribosomal protein L28e - Promoter+5'UTR from T. kok-saghyz
BBa_K5088002	Promoter + 5'UTR	P_GSK3B	Glycogen synthase kinase 3 - Promoter+5'UTR from T. kok-saghyz
BBa_K5088003	Promoter + 5'UTR	P_MGRN1	E3 ubiquitin-protein ligase - Promoter+5'UTR from T. kok-saghyz
BBa_K5088004	Promoter + 5'UTR	P_betB	Betaine-aldehyde dehydrogenase - Promoter+5'UTR from T. kok-saghyz
BBa_K5088005	Promoter + 5'UTR	P_pgm	Phosphoglucomutase - Promoter+5'UTR from T. kok-saghyz
BBa_K5088006	Promoter + 5'UTR	P_FKBP4_5	FK506-binding protein 4/5 - Promoter+5'UTR from T. kok-saghyz
BBa_K5088007	Promoter + 5'UTR	P_CLTC	Clathrin - Promoter+5'UTR from T. kok-saghyz
BBa_K5088008	Promoter + 5'UTR	P_RPL31	Large subunit ribosomal protein L31e - Promoter+5'UTR from T. kok-saghyz
BBa_K5088009	Promoter + 5'UTR	P_CUL1	Cullin - Promoter+5'UTR from T. kok-saghyz
BBa_K5088010	Promoter + 5'UTR	P_VPS4	Vacuolar protein-sorting-associated protein 4 - Promoter+5'UTR from T. kok-saghyz
BBa_K5088011	Promoter + 5'UTR	P_EIF2S3	Translation initiation factor 2 subunit 3 - Promoter+5'UTR from T. kok-saghyz
BBa_K5088012	Promoter + 5'UTR	P_Tubulin	Tubulin - Promoter+5'UTR from T. kok-saghyz
BBa_K5088013	Promoter + 5'UTR	P_EIF5A	Translation initiation factor 5A - Promoter+5'UTR from T. kok-saghyz
BBa_K5088050	Inducible Promoter + 5'UTR	P_HSP12.6	HSP12.6 - Heat inducible promoter+5'UTR from T. koksaghyz
BBa_K5088051	Inducible Promoter + 5'UTR	P_HSP23.5	HSP23.5 - Heat inducible promoter+5'UTR from T. koksaghyz
BBa_K5088102	3'UTR	T_PTI1	Protein tyrosine kinase - 3'UTR from T. kok-saghyz
BBa_K5088103	3'UTR	T_RPL28	Large subunit ribosomal protein L28e - 3'UTR from T. kok-saghyz
BBa_K5088104	3'UTR	T_EPS15	Epidermal growth factor receptor substrate 15 - 3'UTR from T. kok-saghyz
BBa_K5088105	3'UTR	T_GSK3B	Glycogen synthase kinase 3 - 3'UTR from T. kok-saghyz
BBa_K5088106	3'UTR	T_MGRN1	E3 ubiquitin-protein ligase - 3'UTR from T. kok-saghyz
BBa_K5088107	3'UTR	T_RPL35A	Large subunit ribosomal protein L35Ae - 3'UTR from T. kok-saghyz
BBa_K5088108	3'UTR	T_betB	Betaine-aldehyde dehydrogenase - 3'UTR from T. kok-saghyz
BBa_K5088109	3'UTR	T_pgm	Phosphoglucomutase - 3'UTR from T. kok-saghyz
BBa_K5088110	3'UTR	T_ATP-synt	ATPase subunit gamma - 3'UTR from T. kok-saghyz
BBa_K5088111	3'UTR	T_EIF3B	Translation initiation factor 3 subunit B - 3'UTR from T. kok-saghyz
BBa_K5088112	3'UTR	T_RPL31	Large subunit ribosomal protein L31e - 3'UTR from T. kok-saghyz
BBa_K5088113	3'UTR	T_TM9SF2_4	Transmembrane 9 superfamily member 2/4 - 3'UTR from T. kok-saghyz
BBa_K5088114	3'UTR	T_CUL1	Cullin - 3'UTR from T. kok-saghyz
BBa_K5088115	3'UTR	T_PSMB6	20S proteasome subunit beta 1 - 3'UTR from T. kok-saghyz
BBa_K5088116	3'UTR	T_RPSA	Small subunit ribosomal protein SAe - 3'UTR from T. kok-saghyz
BBa_K5088117	3'UTR	T_VPS4	Vacuolar protein-sorting-associated protein 4 - 3'UTR from T. kok-saghyz
BBa_K5088118	3'UTR	T_EIF2S3	Translation initiation factor 2 subunit 3 - 3'UTR from T. kok-saghyz

Table 1: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Dandelion Handbook

References

(1) J. B. van Beilen, Y. Poirier, Establishment of new crops for the production of natural rubber. Trends Biotechnol. 25, 522–529 (2007).

(2) MRC, Malaysian Rubber Council (MRC), MRC Official Website. https://www.myrubbercouncil.com/.

(3) Cherian, S., Ryu, S. B., & Cornish, K. (2019). Natural rubber biosynthesis in plants, the rubber transferase complex, and metabolic engineering progress and prospects. In Plant Biotechnology Journal (Vol. 17, Issue 11, pp. 2041–2061). Wiley. https://doi.org/10.1111/pbi.13181

(4) R. Lieberei, South American Leaf Blight of the Rubber Tree (Hevea spp.): New Steps in Plant Domestication using Physiological Features and Molecular Markers. Ann. Bot. 100, 1125–1142 (2007).

(5) T. S. Suryanarayanan, J. L. Azevedo, From forest to plantation: a brief history of the rubber tree. Indian J. Hist. Sci. 58, 74–78 (2023).

(6) Y. Wang, P. M. Hollingsworth, D. Zhai, C. D. West, J. M. H. Green, H. Chen, K. Hurni, Y. Su, E. Warren-Thomas, J. Xu, A. Ahrends, High-resolution maps show that rubber causes substantial deforestation. Nature 623, 340–346 (2023).

(7) D. A. Ramirez-Cadavid, K. Cornish, F. C. Michel, Taraxacum kok-saghyz (TK): compositional analysis of a feedstock for natural rubber and other bioproducts. Ind. Crops Prod. 107, 624–640 (2017).

(8) S. Piccolella, C. Sirignano, S. Pacifico, E. Fantini, L. Daddiego, P. Facella, L. Lopez, O. T. Scafati, F. Panara, D. Rigano, Beyond natural rubber: Taraxacum kok-saghyz and Taraxacum brevicorniculatum as sources of bioactive compounds. Ind. Crops Prod. 195, 116446 (2023).

(9) J. Collins-Silva, A. T. Nural, A. Skaggs, D. Scott, U. Hathwaik, R. Woolsey, K. Schegg, C. McMahan, M. Whalen, K. Cornish, D. Shintani, Altered levels of the Taraxacum kok-saghyz (Russian dandelion) small rubber particle protein, TkSRPP3, result in qualitative and quantitative changes in rubber metabolism. Phytochemistry 79, 46–56 (2012).

(10) X. Cao, H. Xie, M. Song, J. Lu, P. Ma, B. Huang, M. Wang, Y. Tian, F. Chen, J. Peng, Z. Lang, G. Li, J.-K. Zhu, Cut–dip–budding delivery system enables genetic modifications in plants without tissue culture. The Innovation 4, 100345 (2023).

(11) A. Stolze, A. Wanke, N. van Deenen, R. Geyer, D. Prüfer, C. Schulze Gronover, Development of rubber-enriched dandelion varieties by metabolic engineering of the inulin pathway. Plant Biotechnol. J. 15, 740–753 (2017).

(12) N. van Deenen, K. Unland, D. Prüfer, C. Schulze Gronover, Oxidosqualene Cyclase Knock-Down in Latex of Taraxacum koksaghyz Reduces Triterpenes in Roots and Separated Natural Rubber. Molecules 24, 2703 (2019).

(13) S. M. Wolters, V. A. Benninghaus, K.-U. Roelfs, N. van Deenen, R. M. Twyman, D. Prüfer, C. Schulze Gronover, Overexpression of a pseudo-etiolated-in-light-like protein in Taraxacum koksaghyz leads to a pale green phenotype and enables transcriptome-based network analysis of photomorphogenesis and isoprenoid biosynthesis. Front. Plant Sci. 14 (2023).

(14) V. A. Benninghaus, N. van Deenen, B. Müller, K.-U. Roelfs, I. Lassowskat, I. Finkemeier, D. Prüfer, C. Schulze Gronover, Comparative proteome and metabolome analyses of latex-exuding and non-exuding Taraxacum koksaghyz roots provide insights into laticifer biology. J. Exp. Bot. 71, 1278–1293 (2020).

(15) I. Ganesh, S. C. Choi, S. W. Bae, J.-C. Park, S. B. Ryu, Heterologous activation of the Hevea PEP16 promoter in the rubber-producing laticiferous tissues of Taraxacum kok-saghyz. Sci. Rep. 10, 10844 (2020).

(16) A. Wieghaus, D. Prüfer, C. S. Gronover, Loss of function mutation of the Rapid Alkalinization Factor (RALF1)-like peptide in the dandelion Taraxacum koksaghyz entails a high-biomass taproot phenotype. PLOS ONE 14, e0217454 (2019).

(17) E. Niephaus, B. Müller, N. van Deenen, I. Lassowskat, M. Bonin, I. Finkemeier, D. Prüfer, C. Schulze Gronover, Uncovering mechanisms of rubber biosynthesis in Taraxacum koksaghyz – role of cis-prenyltransferase-like 1 protein. Plant J. 100, 591–609 (2019).

(18) A. Zhou u. a., „A Suite of Constitutive Promoters for Tuning Gene Expression in Plants“, ACS Synth. Biol., Bd. 12, Nr. 5, S. 1533–1545, Mai 2023, doi: 10.1021/acssynbio.3c00075.

(19) T. Lin u. a., „Extensive sequence divergence between the reference genomes of Taraxacum kok-saghyz and Taraxacum mongolicum“, Sci. China Life Sci., Bd. 65, Nr. 3, S. 515–528, März 2022, doi: 10.1007/s11427-021-2033-2.

(20) T. Lin u. a., „Genome analysis of Taraxacum kok-saghyz Rodin provides new insights into rubber biosynthesis“, Natl. Sci. Rev., Bd. 5, Nr. 1, S. 78–87, Jan. 2018, doi: 10.1093/nsr/nwx101.

(21) G. Vancanneyt, R. Schmidt, A. O’Connor-Sanchez, L. Willmitzer, und M. Rocha-Sosa, „Construction of an intron-containing marker gene: Splicing of the intron in transgenic plants and its use in monitoring early events in Agrobacterium-mediated plant transformation“, Mol. Gen. Genet. MGG, Bd. 220, Nr. 2, S. 245–250, Jan. 1990, doi: 10.1007/BF00260489.

(22) A. F. Ibrahim, J. A. Watters, G. P. Clark, C. J. Thomas, J. W. Brown, und C. G. Simpson, „Expression of intron-containing GUS constructs is reduced due to activation of a cryptic 5’ splice site“, Mol. Genet. Genomics MGG, Bd. 265, Nr. 3, S. 455–460, Mai 2001, doi: 10.1007/s004380000433.

Sequence and Features

Assembly Compatibility:

10
INCOMPATIBLE WITH RFC[10]
Illegal EcoRI site found at 1262
Illegal XbaI site found at 184
12
INCOMPATIBLE WITH RFC[12]
Illegal EcoRI site found at 1262
21
INCOMPATIBLE WITH RFC[21]
Illegal EcoRI site found at 1262
Illegal BglII site found at 409
Illegal XhoI site found at 436
23
INCOMPATIBLE WITH RFC[23]
Illegal EcoRI site found at 1262
Illegal XbaI site found at 184
25
INCOMPATIBLE WITH RFC[25]
Illegal EcoRI site found at 1262
Illegal XbaI site found at 184
Illegal AgeI site found at 1761
1000
COMPATIBLE WITH RFC[1000]