Help:Sequencing Tool:Automatic Assessment Algorithm
The Registry provides a tool to automatically align and evaluate sequences against the sequence of a specified part. This is how that algorithm works as of 18 March 2008.
Contents
Definitions
Target Part - This algorithm only evaluates sequence reads against the sequence for a known part. It is not used to align sequence reads and then search for matching parts in teh Registry.
Sequence - A string of bases A, C, G, and T indicating nucleic acids, or N, indicating one unknown base.
Quality - A string of numbers corresponding one-to-one with bases in a sequence.
Sequence Read - The called set of bases from a sequencing reaction.
Raw Sequence - A sequence read as received from the sequencing center.
Forward Sequence - A sequence read on the same strand of DNA and in the same direction as the sequence of the target part. Forward sequences are a result of a forward primer. Forward sequences have been read frm left to right.
Reverse Sequence - The reverse compliment of a sequence read that extended from right-to-left along a target part. The sequencing program converts all raw sequences to either forward or reverse sequences based on the primer used.
New Sequences
When a new sequence is entered or new quality information is entered for a sequence, the software searches for the BioBrick™ prefix and suffix using this algorithm:
The called sequence is searched for the BioBrick Prefix and Suffix using Find_Best_Match. The match must have a score of 15 points or more out of the possible 21 points. The result of this search will be used to display the Prefix and Suffix boxes under the displayed sequence.
Sequence Graphic
Each sequence is displayed as a green sequence graphic. These bars indicate the relative location of the sequence in alignments, the location of the prefix and suffix (if any) and the given quality of the sequence reading.
Best Matching Sequences
This algorthm uses a utility function to align two sequences and report the alignment which matches best.
Find_Best_Match(A, B) shifts sequence A relative to sequence B one base at a time. At each position, it compares the overlapping sequence to see if the bases match. If the two bases match the score for that position is increased by one point if the bases do not match, then the score id decreased by two points. Notes: This does not work well if there is a deleted or inserted base in either sequence.