# Part:BBa_K4630100

stgRNA-barcode-cassette 1

This part is the barcode-cassette form of the self-targeting guide RNA 1 (stgRNA 1). It is the building block of the cascade recording system. It contains a Lac promoter, the upstream barcode, the stgRNA 1, a double terminator and the downstream barcode. It is used in the verification of the self-targeting and self-recombination system. The unique barcode is used to distinguish each cassette and provide unique homologous arm.

### Background of the Cassette

Cassette recorder is the leading edge of DNA memory device. By changing cassettes, the entire recorder is capable to record many rounds of biological events. The method of DNA manipulation is the biological basis of the DNA-based recording.
Perli *et al.* ’s work demonstrates that based on the CRISPR/Cas9 and self-targeting guide RNA (stgRNA), editing could occur in the stgRNA sequence^{1} (fig B1a). Also, Zhao *et al.* ’s work enlightens us with the concept of intra-plasmid recombination^{2} (fig B1b).

Figure B1. DNA manipulating methods from the literatures

### Usage and Biology

We integrate the two methods into a single cassette, with unique barcode (fig B2). Once triggered, double strand break (DSB) is introduced into the working plasmid, enabling the intra-plasmid recombination. It is worth noting that the two homologous arms flank the stgRNA, so the entire cassette will be deleted after the recombination, and the next cassette is expected to be brought to the proper expressing place. The inducer triggers over and over, and recorder records it by keeping changing the cassette.

Figure B2. Working pattern of the change of cassette

### Design

The key component, stgRNA 1, is generated by our guide RNA in-silico generator. As the first cassette, it contains Lac promoter to initiate the induced edition in the proof-of-concept. Also, the double terminator is designed to prevent it from expressing the next cassette. The barcode is also generated by the in-silico generator, to avoid recombination with the bacterial genome.

Figure D1. Design of stgRNA-barcode-cassette 1

## Experiments

This part is the key element we use in the proof-of-concept of the recorder, and it help us better understand the recording event. Also, it is used to measure the basic parameter of the recording event. We cloned it onto pCDF-Duet1 plasmid and co-transformed it with pCas(BBa_K4630200) / pCasop(BBa_K4630201) into *Escherichia coli* DH5α.

The overall experiments we conducted on BBa_K4630100 include:

- Verification of the self-targeting and self-recombination
- Exploration of the concentration and induction time reliance of the inducer
- Exclusion of the unexpected recombination

All experiments we conducted involving BBa_K4630100 include:

- The construction of multi-cassette recorder with barcode
- The verification of the multi-level recording

### General Induction Protocol

- Pick a colony from the verified plate, cultivate in 5ml liquid LB media for 12 hours.
- Transfer 100µl bacteria solution to 3ml new liquid LB media. Cultivate to an OD600 of 0.3~0.5.
- Add L-Arabinose and IPTG to certain final concentration and induce for a certain time (details are in each tests).
- Spread the diluted solution to plates with appropriate antibiotics, and cultivate for 12 hours.
- Pick the colonies and test the editing result.

## Results

### Strain Construction

We successfully co-transformed our working plasmid with pCas (BBa_K4630200), the vector for Cas9 and Lambda Red, into Escherichia coli DH5α (fig 1).

Fig 1 Construction of the working strain

Colony PCR results of the recorder with barcode. The white hollow arrowheads indicate the target bands. Index 5, 6, 7, 8, 9, 10, 11, 12 were successfully co-transformed.

### Pilot Experiment

We added a variable dosage of arabinose and IPTG close to the working concentration (tbl 1). Induction time for L-Arabinose is 22 hours and IPTG 5 hours. The sequencing results uncovered a successful editing (fig 2a). Large-scale screening revealed that condition 2 exhibited the highest efficiency, with efficiencies of 70%, 10%, and 38.1% for condition 2, 3, and 4, respectively (fig 2b). Non-induction controls substantiated that induction is the prerequisite for recording (fig 2c).

Table 1 The random test over inducers

Fig 2 The induction readout

(a) Sequencing result of the picked colonies in the conditions. “R” represents “parallel repeat”. Conditions 2, 3, and 4 exhibited positive knock-out signals.

(b) After induction, the target sequence is supposed to be truncated. The yellow reference line indicates the knock-out sample.

(c) Sequencing results of the non-induction controls. All of them remained intact. N = 24.

### Concentration Matrix

#### First Concentration Matrix

To better measure the property of the cassette, we maintained the induction time at the previous preferred levels (Condition 2, L-Arabinose and IPTG inducing 22hrs and 5hrs respectively), we carried out a concentration matrix test (tbl 2).

Table 2 Group design for the concentration matrix

It’s noted that the induction of arabinose inhibited the growth of bacteria strikingly (fig 3). Despite the missing data due to failure of sequencing, E6, E5, C2, B5 performed better (tbl 3, fig 4a, tbl 4), and the Lac promoter exhibited significant leakage expression (tbl 5, fig 4b). Furthermore, quality test based on electrophoresis provided parallel data for randomly picked groups, and the two data access showed a significant correlation, with paired t-test P = 0.7602, no significant difference (tbl 6, tbl 7, fig 5). Given the substantially larger amount of data from the electrophoresis test for E6 (N = 20) compared to sequencing (N = 4), we adjusted the knock-out ratio of E6 to 60%.

Fig 3 Streaking plate of Group 1, after induction

(a) Bacteria without induction. The bacteria had grown to Area 3.

(b-d) Bacteria under 1, 4, 8 g/L L-Arabinose induction, respectively. The growth of the bacteria was limited to Areas 1 and 2. The bacteria in C1 had been contaminated.

(e) Assessment of the bacteria amount. The introduction of L-Arabinose influences the bacteria amount strikingly.

Fig 4 Results of the concentration matrix

(a) Electrophoresis result of several selected induction group. The yellow lines indicate the knock-out band.

(b) Heatmap of the concentration matrix based on sequencing data. The black block indicates no sequencing data is available. The L-Arabinose 0g/L row and IPTG 0 g/L column indicate the leakage expression of pBAD is quite low while that of pLac is quite high.

(c) Comparison in knock-out ratio of IPTG 0 g/L column and the average of total. Though there is a slight increase along with the L- Arabinose concentration, the presence of L- Arabinose predominates, implying a high ratio of pLac leakage.

Table 3 The total sample amount of each condition

Table 4 The groups with high knock-out ratio

Table 5 Results of the no-IPTG-induction group

Table 6 The paired data of the electrophoresis and sequencing

Table 7 The correlation of the electrophoresis and sequencing result

Fig 5 The relation between data from sequencing and electrophoresis

(a) The correlation relationship of the two data. Pearson r = 0.9837, R squared = 0.9678, P = 0.0025(**).

(b) The normality test of the two data. Under Shapiro-Wilk test (N = 5), the P values for sequencing and electrophoresis are 0.9500 and 0.9364, respectively.

(c) The paired t test result.

#### Second Concentration Matrix

To ensure the data reliability, we tried the concentration matrix again. Growth inhibition and leakage expression still occur (fig 6 a, b). However, the knock-out ratio of the second concontration matrix is generally lower than the first one.

Fig 6 Results of the second matrix

(a) Scratch plate result of the L-Arabinose / IPTG non-induction group. As the previous data, the non- L-Arabinose group growth much better than the L-Arabinose induction ones. Also, the IPTG slightly inhibit the growth of the bacteria.

(b) The heatmap plotted from the second concentration matrix. The total depth of colour is lower than the first matrix one.

#### Short Summary

To compare the two matrices, an integrated heatmap is plotted based on the mean value (fig 7a). The variation pattern of the two matrices showed some kind of correlation (fig 7b). Paired t test result of the two matrices showed a significant difference, with P = 0.0054(**) and mean of differences = -0.2071(fig 7c). Two-way ANOVA of the second matrix and the average matrix indicate that the IPTG concentration is the main variation factor (tbl 8).

The data demonstrate that there is a huge variation in the measurement, and more experimental data are needed. Moreover, the similarity of variation pattern indicates probability of logical regression. And the statistical analysis proves that IPTG concentration is the leading factor controlling the result. To sum up, the data can reflect the induction events to a certain extent, but accurate relationship relies on more data.

Fig 7 Comparison between the two matrices

(a) The heatmap plotted from the average data of the two concentration matrices. When there is no data, data from the other compensate.

(b) The demonstration of the knock-out ratio. The two matrices showed high correlation.

(c) Paired t test result of the two matrices. There is significant difference, with P = 0.0054(**) and mean of differences = -0.2071.

Table 8 Two-way ANOVA result of the matrices

### Time-Gradient Test

The test of knock-out showed a bell-type-like shape (tbl 9, fig 8a). The increasing trend ends on dot 4h and the extension of IPTG induction time does not help the editing afterwards.

Table 9 Group design for the time-gradient test

Fig 8 The knock-out ratio plot of the time-gradient test

### Unexpected knock-out Exclusion

During the large-scale tests, we meet several cases of unexpected knock-out (tbl 10, fig 9). Theoretically, the larger the homologous arm is, the higher the recombination ratio. Interestingly, there are two sets of homologous arms intrinsically flanking the DSB target site. However, the preferred unexpected recombination occurred at the low ratio, 0.028, with a shorter homologous arm. The result demonstrates that the effect of distance is larger than the effect of homologous arm, and both of them are weak. We have more confidence that the change of cassette is sequential.

Table 10 Statistical data of the unexpected recombination

Fig 9 Sequencing result of E6-1, E6-2, and E6-3

It showed normal recombination while E6-1 showed unexpected recombination. The unexpected one is because of the recombination of the two Lac operators flanking the knock-out target site. The actual homologous arm in the situation is 21bp. However, the pT7-Lac promoter set flanking the DSB is 44bp.

### Reference

1. Perli SD, Cui CH, Lu TK. Continuous genetic recording with self-targeting CRISPR-Cas in human cells. Science. 2016;353(6304):aag0511. doi:10.1126/science.aag0511

2. Zhao D, Feng X, Zhu X, Wu T, Zhang X, Bi C. CRISPR/Cas9-assisted gRNA-free one-step genome editing with no sequence limitations and improved targeting efficiency. Sci Rep. 2017;7(1):16624. doi:10.1038/s41598-017-16998-8

Sequence and Features

- 10COMPATIBLE WITH RFC[10]
- 12COMPATIBLE WITH RFC[12]
- 21INCOMPATIBLE WITH RFC[21]Illegal BglII site found at 55
- 23COMPATIBLE WITH RFC[23]
- 25COMPATIBLE WITH RFC[25]
- 1000INCOMPATIBLE WITH RFC[1000]Illegal BsaI.rc site found at 94

None |