Help:Sequencing Tool:Automatic Assessment Algorithm

Revision as of 19:58, 19 March 2008 by Randy (Talk | contribs)

The Registry provides a tool to automatically align and evaluate sequences against the sequence of a specified part. This is how that algorithm works as of 18 March 2008.

Definitions

Target Part - This algorithm only evaluates sequence reads against the sequence for a known part. It is not used to align sequence reads and then search for matching parts in teh Registry.

Sequence - A string of bases A, C, G, and T indicating nucleic acids, or N, indicating one unknown base.

Quality - A string of numbers corresponding one-to-one with bases in a sequence.

Sequence Read - The called set of bases from a sequencing reaction.

Raw Sequence - A sequence read as received from the sequencing center.

Forward Sequence - A sequence read on the same strand of DNA and in the same direction as the sequence of the target part. Forward sequences are a result of a forward primer. Forward sequences have been read frm left to right.

Reverse Sequence - The reverse compliment of a sequence read that extended from right-to-left along a target part. The sequencing program converts all raw sequences to either forward or reverse sequences based on the primer used.


New Sequences

When a new sequence is entered or new quality information is entered for a sequence, the software searches for the BioBrick™ prefix and suffix using this algorithm:

The called sequence is searched for the BioBrick Prefix and Suffix using Find_Best_Match. The match must have a score of 15 points or more out of the possible 21 points. The result of this search will be used to display the Prefix and Suffix boxes under the displayed sequence.

Sequence Graphic

Each sequence is displayed as a green bar in the information box for that sequence and in the Automatic Alignment box if the sequence aligns with the part. The bar will be green where the quality is "good" and gray where the quality is not "good". If a BioBrick™ prefix or suffix is found, it will be marked as a red brick box under the sequence bar.




Find Best Match

The sequencing algorithms use a utility function to compare two sequences and report the alignment which matches best.

Find_Best_Match(A, B) shifts sequence A relative to sequence B one base at a time. At each position, it compares the overlapping sequence to see if the bases match. If the two bases match the score for that position is increased by one point if the bases do not match, then the score id decreased by two points. Notes: This does not work well if there is a deleted or inserted base in either sequence.

Aligning Sequences