Part:BBa_K3159001

NFX1

1 Usage and Biology

MHC(Major Histocompatibility Complex) Class II cells are crucial in immune cell communication. It informs other cells of the impeding danger so that they can be prepared and build necessary defense against the virus, restraining the virus from spreading. The gene expression of MHC Class II cells are primarily controlled by transcription factors that bind to X and Y boxes. NFX1 is one of the transcriptional factor. This sequence codes for a transcriptional repressor protein, which binds to MHC Class II genes on the highly conserved x-box motif in vitro to regulate gene expression. Malfunctioning of this protein may decrease the sensitivity of MHC Class II cells in their inflammatory immune response induced by Interferon-gamma, shortening the period in which defense can be built against antigens.

2 Plasmid construction

2.1 pLEX-MCS Vector

pLEX is a DNA molecule that is capable of self-replication. This genetic vector is often used in genetic programming to transact the gene of interest into a cell, resulting in an over-expression of the targeted gene. In other words, it’s an indispensable pathway that our gene of interest has to take in order to be expressed in a cell. Thus, constructing plasmids is a necessary step for us to get the protein coded by our inserted gene. In our experiment, the pLEX being used for all targeted gene is the pLEX-MCS Empty Vector. One of the benefits of this artificial vector is that it contains multiple restriction enzyme cutting sites, which is ideal for the design of primers for target gene, transfection, and subsequent transduction in E. coli.

Figure 1: stricture of pLEX-MCS used

Characterization of popular BioBrick RBSs

2.2 Primer Design (PCR & Ribosome Cutting)

Regarding our experiment design of co-transformation, involving EGFP and NFX1, we used a double digestion technique in our construction of primers. We selected Xho1 and Mlu1 as EGFP cutting sites, and Spe1 and Xho1 as NF-X1 cutting sites.

tm value is calculated using Snapgene, which helps set up PCR. Once amplified, both genes can be inserted into pLEX-MCS using DNA ligase, creating our plasmids of interest.

2.3 Bacterial Transformation & Selection

To isolate plasmids, we transform the plasmids into DH5-alpha cells. The reason we pick DH5-alpha cells is because those E.coli cells are specifically engineered to maximize transformation efficiency. The cells are conserved in -80 degree Celsius environment, and undergo reviving process before used. Together with our constructed plasmids of interest, they are placed in 800 ul liquid lysogeny broth without ampicillin in incubation shaker for an hour at the rate of 220 rpm, 37 degree Celsius. This step allows the bacteria to ingest the plasmid and express the gene that codes for ampicillin resistance. After, centrifugation is used to filter out the waste and obtain bacteria that potentially contain our plasmid of interest. To further select the bacteria that have taken in the plasmid containing antibiotic resistance, we cultured the bacteria on solid LB medium with ampicillin overnight. From the grown bacteria colonies, we selected a few and extracted the plasmids. To further confirm they are the plasmid of interest, we send those for DNA sequencing.

3 Cell Cultivation

3.1 HeLa Cells

In our experiment investigating cervical cancer, we chose to use HeLa cells, which is the oldest and most commonly used cell line in scientific research. These immortal cells were taken from Henrietta Lacks, a cancer patient in 1951.

3.2 Thawing & Cell Culture

The cells are stored in solid nitrogen at -196 degree Celsius. When taken out, they are based in 37 degree Celsius water for 2 minutes and added into 4-inch petri dishes with 10 ml of cell culture media made with 500ml DMEM + 50ml 10%FBS+5.5ml Streptomycin/Penicillin(to avoid growth of bacteria). We used 18 of 4-inch petri dishes for three groups of experiments, 6 dishes each. During the process of cell growth, metabolism of cells requires medium exchange at least every 24 hours.

3.4 Cell Passaging

Roughly every three days, the cells will reach a certain density in the dish that they will be better off being split into different dishes. First we used PBS for cleansing purposes, and added 1ml Trypsin (a serine protease) to degrade the protein that attaches the cells to the walls of the petri dishes, so that the cells will be in spherical forms, granted with mobility. After 60 seconds, DMEM must be added to stop degrading of proteins, and all liquid in the dish will be transported into EP tubes for centrifugation (1000g 1min). Supernatant is gathered and redistributed into three separate petri dishes.

3.5 Transduction

Solution concentration: opti-MEM 20%, Lip 3000 1:3 (6ul lipofect-transfection regent used to boost the efficiency of the transduction process)

Total system: 6ml in 4-inch petri-dish

Note that cell culture medium is changed to DMEM only, no nutrient contained for the cells, promoting endocytosis of vectors into the cells.

After roughly 48 hours, the protein coded by the gene inserted into the cell will be expressed completely, and further experiments can be carried out.

4 SDS-Page Analysis (Western Blot)

Figure 2: SDS-Page separates protein based on molecular weight.Near the line at roughly 150 kD, NFX1, which is 123 kD is found to be overexpressed in pLEX inserted with NFX1 gene. All three of pLEX-NFX1 sample appear stronger than those of the pLEX with empty vectors(pLEX-EV), which proves our expression of NFX1 protein to be a success.

Figure 3 Western Bolt Immunoblotting.Empty vector and NFX1 are added alternatively for purpose of fully displaying contrast. A ladder was used to identify NFX1 and actin’s location on polyacrylamide Gel before the sections were cut out for picture taking.

Results:

Here our team find a list of proteins that are highly likely to be associated with an over-expression of NFX1-123.

Hela cells were either injected with empty vectors, or plasmids containing NFX1 and GFP genes. After they fully expressed inserted genes, we ran Western Blot, IP MS, and FASP to determine and confirm the differential proteins.

After careful analysis, the statistical outcomes of IP and FASP Mass Spectrum combined, indicated significant changes in the expression of 280 proteins caused by NFX1 overexpression. Our findings can serve as a convenient bio marker to identify people carrying over-expressed NFX1, signal high risk of cervical cancer, and prevent cervical cancer in early stages.

We found in controlled in vitro assays that an over expression of NFX1-123 results in different concentration of certain proteins compared to normal. Most importantly, our results are highly convincing because we utilized three distinct methods to confirm and filter the results, ensuring a strong relativity of NFX1-123 gene expression and the proteins found.

Our results indicate potential proteins engaged in HPV infection, providing a new method to diagnose people with high risk of HPV virus even before they are actually infected. Since cervical cancer can only be cured in early stages, this evaluation by protein concentration may serve as important signs that help us to prevent and combat cervical cancer well before it gets uncontrollable. We also anticipate our assay to be starting point for further research on the mechanisms of HPV virus. For example, a specific protein associated with NFX1 gene could be tested to examine the role of that protein in HPV infection.

1.Primary Results

1.1 Sample Information

We prepared a total of 18 dishes of HeLa cells. Each experiment used 6 dishes of cells, consisting of 3 normal cells and 3 with overexpression of NFX1 protein.

1.2 Protein Identification Counts

Total spectra: Number of Secondary Mass Spectra;

Spectra: Number of Spectra Matched with Identified Peptides.

Peptides: The total number of distinct peptide sequences identified in the protein group.

Protein groups: Identified Protein Groups. A protein group consists of the following: One master protein that is identified by a set of peptides that are not included (all together) in any other protein group. All proteins that are identified by the same set or a subset of those peptides.

2. Proteomics & MS Results

2.1 Protein Quantification Assay & SDS-PAGE

2.2 Distribution of Peptide Score

The x-axis represents peptide score by Maxquant; the y-axis represents number of total peptides with scores within the according range. Maxquant scores of MS 2 are generally ideal.

The distribution in the histogram above is approximately normal, therefore, we speculate the results are random.

2.2 Distribution of Peptide Score

The x-axis represents peptide score by Maxquant; the y-axis represents number of total peptides with scores within the according range. Maxquant scores of MS 2 are generally ideal.

The distribution in the histogram above is approximately normal, therefore, we speculate the results are random.

2.3 Distribution of Proteins’ Molecular Weights

The x-axis represents proteins’ molecular weights; the y-axis indicate the number of proteins that has a molecular weight in the according range. The distribution in the histogram above is unimodal and extremely positive-skewed. This means that the majority of the proteins detected have relatively small molecular weight.

2.4 Distribution of Peptides’ Lengths

The x-axis represents length of peptides in amino acids; the y-axis represents frequency of identified peptides. The distribution in the histogram above is unimodal and positive-skewed. This shows that more shorter peptides were detected than longer ones. The majority of the peptides detected have a length of three to thirteen amino acids.

2.5 Sequence Coverage of Protein Groups

The percent coverage calculated by dividing the number of amino acids in all found peptides by the total number of amino acids in the entire protein sequence. This histogram illustrates the distribution of proteins with respect to their peptide coverage. The x-axis represents proteins’ percentages of covered sequence; the y-axis represents the counts of identified proteins. From the graph, we can see that there’s still a decent amount of protein with a coverage rate higher than 70%, with some even near the 90% rage. This indicates a very successful experiment. According to school of medicine at University of Virginia, a 70% coverage is a very successful protein analysis.

There are several reasons why an analysis does not find all amino acids.

• protein does not digest well

• peptides too hydrophilic or small-they pass through the reverse phase column with salt and are not analyzed

• peptides too large/hydrophobic-they stick in gel, adsorb to tubes, do not elute from column, or are too large for the mass spectrometer to analyze because of poor fragmentation

• peptides fragment in ways which cannot be analyzed. Many spectra in an analysis cannot be interpreted. Some spectra only give limited data; proline, histidine, internal lysine and arginine are some reasons peptides do not give complete fragmentation data.

2.6 Distribution of Number of Identified Peptides of Proteins

This graph displays the distribution identified proteins with respect to the number of matching peptides. The x-axis represents the number of peptides matched with the identified protein, and the y-axis represents the number of proteins.

2.7 Identified Protein Counts

● 4673 unique proteins are identified from the 6 samples.

● After filtering out proteins marked in reverse and potential contaminated column as well as proteins with less than four data points out of six total, 2420 proteins are kept.

● Out of these proteins, 575 proteins have a p value less than or equal to 0.05, meaning their data is significant rather than mere luck.

● Within the 575 proteins, 133 of them have a fold change of Overexpression/Normal-greater than 1.2(upregulation), and 147 proteins have a fold change of Normal/Overexpression greater than 1.2(downregulation).

2.8 Boxplots of Protein Quantification Assay

On the x-axis are the names of each sample, and the y-axis represents the (Multiple of Median) MoM of LFQ(Label-Free Quantification) intensity (log base 2). It shows the individual sample deviation from the median value.

2.9 Correlation Matrix of Protein Quantification

In the graph’s lower left are scatterplots of proteins correlation. On the upper right, there are Pearson’s correlation of determination. The diagonal line represents the samples.

2.10 Volcano Plots of all identified proteins

Fold Change (FC) = (Overexpression)Protein Contents/(Normal) Protein Contents

Red: ( Overexpression） fold change > 1.2, n = 133.

Green: (Normal）fold change > 1.2, n= 147.

2.11 Heatmap of Differential Proteins（Clustering）

2.12 Principal Component Analysis of Differential Proteins（PCA）

Using differential proteins, the two groups can be differentiated distinctively and easily.

Sequence and Features

Assembly Compatibility:

10
COMPATIBLE WITH RFC[10]
12
COMPATIBLE WITH RFC[12]
21
INCOMPATIBLE WITH RFC[21]
Illegal BamHI site found at 960
23
COMPATIBLE WITH RFC[23]
25
COMPATIBLE WITH RFC[25]
1000
COMPATIBLE WITH RFC[1000]