Difference between revisions of "Part:BBa K4815000:Design"
(→Design Notes) |
(→Source) |
||
Line 11: | Line 11: | ||
===Source=== | ===Source=== | ||
− | + | We set out a large-scale search for raw data that can be used to train AI, and finally, we found a dataset published by a Nature article, a total of 30 million sets of core promoter sequences and expression data, the format is shown in the following figure, the randomly synthesized core promoter sequences with their expression rate represented by relative fluorescence intensity (which is described in detail in wet lab cycle in page engineering success) through high-throughput technique. The total data scale is large enough to cover all possibilities of any interaction between the 80bp core promoter and the transcription factors. | |
− | + | We further generate sub dataset of the total data with various sample sizes to train Pymaker, and we use our best perform one to generate PYPH1. | |
===References=== | ===References=== |
Revision as of 18:05, 10 October 2023
PYPH1 -> Pymaker generated yeast promoter High 1
- 10COMPATIBLE WITH RFC[10]
- 12INCOMPATIBLE WITH RFC[12]Illegal NheI site found at 1
- 21INCOMPATIBLE WITH RFC[21]Illegal BamHI site found at 198
- 23COMPATIBLE WITH RFC[23]
- 25COMPATIBLE WITH RFC[25]
- 1000INCOMPATIBLE WITH RFC[1000]Illegal BsaI site found at 78
Design Notes
We set out a large-scale search for raw data that can be used to train AI, and finally, we found a dataset published by a Nature article, a total of 30 million sets of core promoter sequences and expression data, the format is shown in the following figure, the randomly synthesized core promoter sequences with their expression rate represented by relative fluorescence intensity (which is described in detail in wet lab cycle in page engineering success) through high-throughput technique. The total data scale is large enough to cover all possibilities of any interaction between the 80bp core promoter and the transcription factors. We further generate sub dataset of the total data with various sample sizes to train Pymaker, and we use our best perform one to generate PYPH1.
Source
We set out a large-scale search for raw data that can be used to train AI, and finally, we found a dataset published by a Nature article, a total of 30 million sets of core promoter sequences and expression data, the format is shown in the following figure, the randomly synthesized core promoter sequences with their expression rate represented by relative fluorescence intensity (which is described in detail in wet lab cycle in page engineering success) through high-throughput technique. The total data scale is large enough to cover all possibilities of any interaction between the 80bp core promoter and the transcription factors. We further generate sub dataset of the total data with various sample sizes to train Pymaker, and we use our best perform one to generate PYPH1.