Help:Sequence Analysis:Software Design

Software design 6-6-2008

I now have the sequence analysis software working much better.

  • Edits go in and are saved.
  • Comments are saved.
  • The user can specify the result of the analysis.
  • In the error section, it tells you when the read is beyond the 900 bp limit.

Note: deleting a base works. Changing a base works. To insert, just change a base to more than one base.

I have gone through a bunch of parts in plate 1000 with no errors.

After examining and editing some sequences, I found that most of the parts were good after all. Some were clearly wrong.

However, while it was possible to find that parts were good, it was not easy. It was necessary to use all the information available. For example, in one part, the first 800 bases were easy, but I had to look at the "blast against part" for both directions to see that all the bases were well covered.

One part had been processed by Long Read. This fixed up an otherwise bad reading.

I noticed that some of the Phred trace files are missing. We should find out why, but this is not urgent.


Notes for the next design

Having done this version of the software, the design for the next version is cleared. Perhaps this can be done in the late summer or early fall. Here are some changes that should be made.

1. Use the Phred data. (However, some users will only have machine called sequences. Perhaps we can run their raw data through Phred for them.)

2. The Long reading program did a better job of calling bases than the machine when the quality was low. We need to see if this is generally true.

3. The current software makes a single alignment of the reading to the part. An insertion or deletion is not dealt with at all. This is probably the largest issue with the program. A new version needs to provide a print-out like the 2-sequence blast so that you can easily see all of the alignment between the part and all the sequences.

4. The editing must be like normal WYSIWYG editing. Drag across the sequence and type.

5. Internally, the software needs a new data structure to deal with all these changes.

6. We should be able to see the electrophorogram on the web site all lined up with the part and the sequences.

It may be possible that some other program can take our information, let the user do a good job, and then dump its modified document back in the Registry so that others can see what was done. (seems unlikely)

Other comments?

Comments on the software

The "Blast against" target part does that kind of alignment. The current software is a mix of an integrated system and some independent tools. The integrated portion is the alignment of all the sequences and then the base-by-base comparison with the part.

The independent tools are:

  • The "Blast against target part"
  • 4Peaks to display trace files with accurate identification of the problem base
  • User settable result (It remembers what the user says and what the computer says,
 but always reports the user's result.

I have been noticing that in talks about sequence analysis, they always talk about the degree of coverage. 5 times to 10 times coverage seend normal. Perhaps we are going to need more coverage.

Also, I keep seeing dye blobs, but can read the bases under the blob.

I suspect it is very different when we try to confirm a known sequence as opposed to finding an unknown sequence.