Help:Sequencing Tool:Automatic Assessment Algorithm

The Registry provides a tool to automatically align and evaluate sequences against the sequence of a specified part. This is how that algorithm works as of 18 March 2008.

Definitions

Target Part - This algorithm only evaluates sequence reads against the sequence for a known part. It is not used to align sequence reads and then search for matching parts in the Registry.

Sequence - A string of bases A, C, G, and T indicating nucleic acids, or N, indicating one unknown base.

Quality - A string of numbers corresponding one-to-one with bases in a sequence.

Sequence Read - The called set of bases from a sequencing reaction.

Raw Sequence - A sequence read as received from the sequencing center.

Forward Sequence - A sequence read on the same strand of DNA and in the same direction as the sequence of the target part. Forward sequences are a result of a forward primer. Forward sequences have been read from left to right.

Reverse Sequence - The reverse complement of a sequence read that extended from right-to-left along a target part. The sequencing program converts all raw sequences to either forward or reverse sequences based on the primer used.

New Sequences

When a new sequence is entered or new quality information is entered for a sequence, the software searches for the BioBrick™ prefix and suffix using this algorithm:

The called sequence is searched for the BioBrick Prefix and Suffix using Find_Best_Match. The match must have a score of 15 points or more out of the possible 21 points. The result of this search will be used to display the Prefix and Suffix boxes under the displayed sequence.

Sequence Graphic

Each sequence is displayed as a green bar in the information box for that sequence and in the Automatic Alignment box if the sequence aligns with the part. The bar will be green where the quality is "good" and gray where the quality is not "good". If a BioBrick™ prefix or suffix is found, it will be marked as a red brick box under the sequence bar.

Find Best Match

The sequencing algorithms use a utility function to compare two sequences and report the alignment which matches best.

Find_Best_Match(A, B) shifts sequence A relative to sequence B one base at a time. At each position, it compares the overlapping sequence to see if the bases match. If the two bases match the score for that position is increased by one point if the bases do not match, then the score id decreased by two points. Notes: This does not work well if there is a deleted or inserted base in either sequence.

Automatic Alignment

The Automatic Alignment section compares each sequence read to the target part's sequence.


Each base in the part is compared to the corresponding base in every aligned sequence read. The program scans through all of the aligned sequences and selects the sequence with the highest quality value. If that base is an N, then the part is marked as 'N" at that base. If they agree, then that base is "good'. If they disagree, that base of the part is marked as 'Bad". If the sequence read does not have quality information, then all of the aligned sequences are examined. N's are ingnored. If ANY of the sequence reads agree with the part, then the base is marked as "Good". Otherwise, the base is marked as "Bad".

Status for the reading is reported as Inconsistent if any of the bases of the part were marked as "Bad". If all of the bases are marked "Good" the par is marked as Confirmed. If any base is marked as neither Good nor Bad, the result is marked as "Not enough information".