Part:BBa_K4737002

Anti-CEA scFv

Anti-CEA scFv is the gene encodes anti-CEA scFv antibody from Mus musculus (GenBank: BAA19944.1)

Usage and Biology

Although genetically modified S. typhimurium VNP20009 is a useful vehicle for cancer therapy and vaccine development, it still exhibits limited tumor targeting in vivo. This implies that it is necessary to enhance the ability of engineered bacteria to target tumor cells. Carcinoembryonic antigen (CEA) is abundantly expressed in a wide range of human carcinomas, including gastrointestinal tract, pancreatic, non-small cell lung and breast cancers, thus constituting a common therapeutic target. The OmpA protein is one of the main outer membrane proteins of Gram-negative bacteria, which can serve as a carrier for the expression of foreign antigens on the surface of Gram-negative bacteria including Salmonella spp. A method that takes advantage of efficient targeting of OmpA to the outer membrane and allows C-terminal fusion of passenger proteins to be displayed is the Lpp-OmpA expression system. Based on the aforementioned studies, we proposed the expression of high-affinity CEA-specific single chain antibody fragments (scFv) into use on the surface of the bacteria.

Figure 1: The Lpp-OmpA-scFv expression pathway.

Western blotting proved that Lpp-OmpA-scFv-GFP fusion protein could be expressed in VNP20009 (Fig.4). A GFP tag was added to the C-terminus of Lpp-OmpA-scFv on the plasmid, in order to characterize the expression of anti-CEA scFv.

Figure 2: WB analysis of the expression of specific single chain antibody fragments(scFv). Lpp-OmpA-scFv-GFP molecular weight is about 66 kDa and GAPDH is a reference protein in cells with a molecular weight of 36 kDa.

We chose human gastric cancer cell line NUGC-3 with high-CEA-expression as experimental group, and a human gastric cancer cell line BGC-823 with low-CEA-expression was used as CEA negative cell lines. The engineered bacteria with GFP tag and the negative control with RFP tag were used to infect the above two types of cells simultaneously, and the function of anti-CEA scFv was verified by the infection efficiency of the bacteria. PRACTICE 1 Firstly, we infected cells by engineered bacteria with pFPV25.1-[Lpp-OmpA-scFv]-GFP vector and found that the green fluorescence of the engineered bacteria was very weak.

Figure 3: Diagrammatic sketch of Practice 1.

DEBUG 1 By reviewing the literature, we found that the VH and VL of scFv are connected by a linker, and the transient dissociation of VH and VL leads to the instability of scFv and the downstream GFP protein1. PRACTICE 2 During the second attempt, the constructed plasmids were electroporated into VNP20009 competent cells which were stably transfected with GFP. We infected NUGC-3 and BGC-823 with the constructed engineered bacteria and negative control simultaneously. For NUGC-3 with high-CEA-expression, the infection efficiency of engineered bacteria (green fluorescence) was higher than that of negative control (red fluorescence), but there was no significant difference in the infection efficiency of BGC-823 which was with low-CEA-expression (Fig.4).

Figure 4: Fluorescent microscopy results of BGC-823 and NUGC-3 cells co-infected by engineered bacteria and VNP2009 90 minutes later. Both the engineered bacteria and negative control infections were at the MOI of 1:50. Red frames outlined in the figure were Engineered bacteria that infected cells.

DEBUG 2 In this infection, we wanted to determine whether scFv really worked by comparing the efficiency of different bacteria in infecting NUGC-3 and BGC-823. However, we could not exclude the possibility that there was an interaction between the engineered bacteria and the negative control that affected their infection efficiency, which meant that further improvement of our experiments was necessary.

PRACTICE 3 In this attempt, we added control experiments in which the engineered bacteria and the negative control infected the cells separately while infecting the cells with the two bacteria simultaneously. However, NUGC-3 grew slowly, was in poor condition, and died in large numbers after a short time of infection with engineered bacteria. This resulted in a large number of engineered bacteria being washed off with PBS along with the dead cells during the infection process, which was very inconvenient for our observation. Therefore, we selected the human colorectal cancer cell line LS174T, which also has a high expression of CEA, as the experimental group. By analyzing the results of bacterial infection of LS174T cells and BGC-823 cells , we got the same conclusions as above, whether the engineered bacteria and the negative control infected the cells separately or simultaneously (Fig.5-7).

Figure 5: Fluorescent microscopy results of LS174T cells with scFv delivery by bacterial infection 2 hours later. Both the engineered bacteria and negative control infections were at the MOI of 1:50.

Figure 6: Microphotographs Results of BGC-823 cells with scFv delivery by bacterial infection 2 hours later. Both the engineered bacteria and negative control infections were at the MOI of 1:50.

Figure 7: Microphotographs Results of BGC-823 and NUGC-3 cells co-infected by engineered bacteria and VNP20009 2 hours later. Both the engineered bacteria and negative control infections were at the MOI of 1:50.

To conclude, anti-CEA single chain antibody fragments (scFv) can be expressed on the surface of VNP20009 and effectively enhance the targeting of tumor cells by engineered bacteria.

Protein Molecular Modeling

1. Introduction
We conducted molecular modeling analysis on the single-chain antibody fragment variable (scFv) employed in this project to investigate the interaction between the scFv and the carcinoembryonic antigen (CEA) present on the surface of tumor cells. We employed molecular docking and corresponding calculations to predict the structure of the antibody and its affinity for CEACAM5. To assess the practical value of our antibody, we thoroughly examined the exploitability of scFv, including aggregation and stability. Traditionally, researchers have reduced the immunogenicity of mouse antibodies by grafting the CDR region from mouse antibodies onto the human variable region framework, thus creating humanized antibodies[1]. Using the original sequence as a reference, we utilized modeling techniques to identify the key amino acid sites involved in antigen binding. Through virtual mutation of these sites, we aimed to enhance the affinity of the anti-CEA scFv and enable the engineered bacteria to exhibit stronger tumor cell targeting ability. In general, the protein molecular modeling process encompasses structural prediction of Lpp-OmpA-scFv and molecular docking with CEACAM5, followed by the modification of the antibody proteins based on an understanding of their complex structure (Figure.8).

Figure 8: Protein molecular modeling workflow.

2. Structure prediction of Lpp-OmpA-scFv

(i) Background
Since the structure of the Lpp-OmpA-scFv is unknown, our first step is to predict the structure of the antibody. AlphaFold2 (AF2), an advanced deep learning model, has achieved unprecedented performance in predicting the structure of single-chain proteins[2]. UCSF ChimeraX is the excellent molecular visualization tool launched by Resource for Biocomputing, Visualization, and Informatics (RBVI) following UCSF Chimera. We attempted to open the AlphaFold tool in ChimeraX and use ColabFold to make new protein predictions[3].

(ii) Methodology
We obtained the amino acid sequences of the Lpp-OmpA-scFv protein in a FASTA file. Additionally, we opened the AlphaFold panel in ChimeraX and copied the protein's amino acid sequence into the designated box. In the Options, we selected "Energy-minimize predicted structures," "Trim fetched structure to the aligned structure sequence," and "Use PDB templates when predicting structures." We clicked on the predict button to initiate the execution. Selecting all options may cause longer processing time, but it yields more accurate predictions. The 3D structure of the protein was predicted using ColabFold, a free computational environment provided by Google.

(iii) Results Upon analyzing the prediction results, we downloaded the optimal prediction (Figure 9.A).We assessed the reliability of each segment of the predicted structure by analyzing the additional generated images(Figure 9.B, C, D). The AlphaFold prediction provides expected position error values for each residue pair (X.Y),showing the predicted position error at residue X when aligned with residue Y in the true structure. These residue-residue "predicted aligned error" values can be visualized with an error plot (Figure 9.B).Additionally, a sequence coverage plot was generated to examine the number of similar sequences found at different positions in the Lpp-OmpA-scFv (Figure 9.C).The predicted structures include atomic coordinates and confidence estimates for each residue, with scores ranging from 0 to 100. Higher scores indicate higher confidence. This confidence measure is called pLDDT and corresponds to the model's predicted per-residue scores on the IDDT-Ca metric (Figure 9.D).

Figure 9.A: Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2. Best structural prediction of proteins of Lpp-OmpA-scFv.

Figure 9.B: Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2. Predicted aligned error plot.

Figure 9.C: Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2. Sequence coverage plot.

Figure 9.D: Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2. IDDT prediction per position plot.

(iv) Analysis
To predict the transmembrane protein regions of the aforementioned structure, we incorporated an implicit membrane into the protein structure. Additionally, we modified the membrane properties by utilizing the Analyze Transmembrane Proteins tools in Discovery Studio. The Hidden Markov Model (HMM) was employed to predict the transmembrane helices based on the amino acid sequence of the protein. Subsequently, a hidden membrane consisting of two parallel planes was introduced to the protein structure (Figure 10). The placement of the membrane was determined by optimizing the simplified solvation energy. If there was a significant charge difference between protein residues located outside the membrane, adjustments were made to the membrane..

Figure 10: Transmembrane protein regions prediction of Lpp-OmpA-scFv. The angle of inclination is defined as the angle between the first principal axis of the protein and the normal of the membrane, and the angle of rotation is the angle between the second principal axis of the atomic set and the normal of the plane defined by the first principal axis and the normal of the membrane.

3. Molecular docking

(i) Background
We employed molecular docking to investigate the interaction between the antigenic protein and the scFv. Carcinoembryonic antigen related cell adhesion molecule 5(CEACAM5), also known as CEA or CD66e, belongs to the carcinoembryonic antigen family[4]. ZDOCK is a protein interaction docking program based on fast Fourier transform. It is primarily utilized to explore all potential binding modes by translating and rotating two proteins in space, and subsequently assess each binding model through an energy-based scoring function. In ZDOCK version 3.0.2, IFACE statistical potential energy, structural complementarity, and electrostatic scoring functions are employed. ZDOCK 3.0.2 was employed to dock CEACAM5 and Lpp-OmpA-scFv proteins, and the most optimal docking result was selected upon completion. The PyMol V2.4.0 software was utilized to label and present the binding sites of the docking complex.

(ii) Methodology
The structure of the human CEACAM5 protein was searched for in the PDB database (https://www.rcsb.org/) and subsequently downloaded. The Lpp-OmpA-scFv prediction from the previous step was optimally prepared. We uploaded both protein structures and performed the ZDOCK 3.0.2 prediction. Following this, we downloaded and installed PyMol V2.4.0 software. Upon completion of the molecular docking, we opened the resulting complex structure file with the best prediction using PyMol V2.4.0 software. To delete all solvent molecules and display sticks within 5 angstroms of scFv or CEACAM5, we entered the command below into the command line.
PyMOL>remove solvent
PyMOL>bg_color white
PyMOL>show sticks, byres CEACAM5 within 5 of scFv
PyMOL>show sticks, byres scFv within 5 of CEACAM5
Afterward, we conducted a search and selection process to identify residues that establish contact bonds, which were then utilized for labeling and displaying the binding sites of the docking complex. We particularly emphasized amino acids capable of forming hydrogen bonds on the interaction surface. Following that, we uploaded the docking complexes to PDBePISA in order to analyze the interaction domains and surfaces of the proteins involved.

(iii) Results
Amino acids that can form hydrogen bonds in the docking molecules are shown in the picture (Figure.4 ). The gray dotted line is the amino acid residue interaction hydrogen bond, the capital letter is the abbreviation of the corresponding amino acid, and the number is the position information of the amino acid in the protein sequence. For example, S362 is serine 362 in CEACAM5 protein. G130 is the glycine 130 in Lpp-OmpA-scFv protein. Upload the best predicted protein structure in PDBePISA, analyze the interaction between the two proteins and download the summary of the interaction surface (Table.1).

Figure 11.A: The docking of antibody molecules with antigen molecules. Results of molecular docking, Lpp-OmpA-scFv is shown in green and CEACAM5 is shown in orange.

Figure 11.B: The docking of antibody molecules with antigen molecules. Schematic representation of the molecular docking surface. Among them, amino acids that can form hydrogen bonds are shown.

Table.1 Summary of docking complex interface

4. Developmental analysis of antibodies

(i) Background High concentrations of antibodies often result in their aggregation, which can diminish their activity and potentially induce an immune response[5]. The calculation of antibody aggregation trend is an indicator to measure the tendency of protein surface amino acid aggregation[6]. Identifying the protein surface specify regions that are prone to aggregation allows us to use targeted mutation to engineer proteins with higher stability. The aggregation of scFv proteins is also an important factor affecting their developability. We calculated DI(Fv) scores for assessing the exploitability of scFv.

(ii) Methodology
We use the apply forcefield command in the change forcefield tools to apply a CHARMm forcefield to the Lpp-OmpA-scFv protein structure. Then we calculated the solvent accessible surface area (SAA) for the side-chain of each residue in Discovery Studio. In the Tools Explorer, we launched the Macromolecules and Predict Protein Aggregation Tools panel, click ‘Calculate Aggregation Scores’. In the parameter setting interface that appears, we chose the proteins for the calculation and set the Cutoff Radius to 5,7,10. Per atom aggregation propensity score is calculated as the ratio of the actual side-chain SAA to the SAA of side-chain atoms in a fully exposed residue of the same type, multiplied by the residue hydrophobicity, for all residues within the Cutoff Radius of each atom. We use the Developability Index (DI) to analysis anti-CEA scFv protein, which is a measure used to rank the aggregation propensity of related proteins based on their hydrophobic and total charge properties. The DI is calculated from the aggregation propensity score (APS) minus the weighted squared total of the charge as described by the following formula:
DI = [APS] - β × [total charge]²
[where APS is the positive aggregation score and is calculated as the summation of all positive atomic aggregation scores.]

(iii) Results The color is based on the protein aggregation trend score of the Cutoff radius=10. Atoms with a high protein aggregation trend score will be shown in red, while atoms with a low protein aggregation trend score will be shown in blue (Figure.12C). In addition, a line chart and a point chart are automatically created (Figure. 12A, Figure. 12B). In the line chart, the amino acid sequence number was taken as the horizontal coordinate, and the protein aggregation trend score was taken as the vertical coordinate. In the point chart, the sequence number of protein aggregation sites was taken as the horizontal coordinate, and the score of protein aggregation trend was taken as the vertical coordinate. Site 1 is shown in red in Figure.12C, and we click the arrow on the right side of the Display Style at the top of the window to better show the details of the amino acids in Figure 12.D. Because anti-CEA scFv protein is an antibody with variable regions, the APS is calculated for these to generate DI(Fv) scores (Table.2).

Figure 12.A: Protein aggregation propensity scores calculation using CHARMm. Line chart of protein aggregation tendency score.

Figure 12.B: Protein aggregation propensity scores calculation using CHARMm. Point chart of aggregation tendency score.

Figure 12.C: Protein aggregation propensity scores calculation using CHARMm. Solvent surfaces stained according to protein aggregation tendency scores.

Figure 12.D: Protein aggregation propensity scores calculation using CHARMm. Show detail of site 1.

Table.2 Developability indices of anti-CEA scFv

5. Redesign of antibody molecules

5.1 Humanization of anti-CEA scFv by CDR Grafting

(i) Background
Humanization is the process of engineering an antibody that retains the antigen-binding specificity and affinity of a non-human antibody, but exhibits low human immunogenicity and does not compromise stability[7]. Humanized scFv can mitigate the human immune response against mouse antibodies. Humanized scFv can reduce human anti-mouse antibody response. This project we aimed to humanize the heavy chain variable (VH) and light chain variable (VL) regions of the anti-CEA scFv to develop a humanized scFv exhibiting minimal immunogenicity while maintaining high binding affinity for CEACAM5.

(ii) Methodology
The FASTA file of anti-CEA-scFv (without Lpp-OmpA) obtained in the previous steps were prepared. Open the FASTA file in Discovery Studio and load the sequences to annotate in a Sequence Window. Before humanizing scFv, we first identified and annotated domains and Complementarity Determining Regions (CDR) of antibody sequences. Several Hidden Markov Models (HMM) are precomputed for the protocol and used to scan the input protein sequence to identify the antibody domains. Then we selected the sequence, clicked on "Predict Humanizing Mutations" and set the target scores parameter to 0.9. We set both the "Germline and Frequent Residues" and "Machine Learning Prediction" parameters to “True” and started the prediction program. After completion of the run, we checked the residue substitution alignment to contain the various mutations suggested by the scheme, and calculated the mutation stability energy of the mutations.

Results
First of all, we predicted sequence properties of anti-CEA scFv protein in Discovery Studio. The tool is perfect to predict the antibody domain, the features can be filtered to retain only those that are within the specified antibody domain or complementarity determining regions (CDRs). The characteristic prediction results of anti-CEA scFv protein sequences are shown in Figure 13.

Figure 13: The characteristic prediction results of antibody protein sequence. The feature and antibody domains and CDRs showing in the graphic. The color of light chain CDR is magenta, the color of heavy chain CDR is red.

Predicting humanized antibody structure requires residues in the framework region of the mutation-variable domain, which can be obtained from the antibody sequence annotation information in our previous step, making these residues more similar to the types found in human antibodies (Figure 14). The interactive residue analysis report for each variable domain contains a legend that defines the colors and styles that represent the residue features in the table. There is a detail legend in the interactive report, which we display by hovering the cursor over each residue cell. The interactive residue analysis report for each variable field contains a legend that defines the color and style to represent the residue characteristics in the table (Table.3, Table.4).

Figure 14: Residues substitutions of humanized antibody prediction. The first sequence is a query annotated using the normal antibody annotation style, and the second sequence is also a query sequence, but cursor residues are indicated by blue caret, hotspots are orange, and the combined Kabat and IMGT CDR regions are magenta or red, depending on the domain. The next three seq