Help:Sequence Analysis

This is a new page for Help about sequence analysis:


The sequencing process produces a series of data points for each of the fluorescent dyes used to mark the nucleic acids. That data is processed by software in the sequencer and is also processed by the sequencing center using a program called Phred20.

The data points are interpreted in two ways:

1. Software in the sequencer calls bases from the data and produces two files:

   text - sequence of called bases with N's where it can't guess
   chromat - (or trace) with the electrophorogram and the called bases from "text"
   We store those files as Sequence and Trace.

2. The sequencing center runs the data through a program, Phred Q20. It calls bases in its own way and produces four files:

   fasta - sequence of called bases (There are no N's)
   qual  - a file of numeric quality scores corresponding to the fasta file
   scf   - a combination of the chromatograph, the fasta bases, and the qual values
   phd   - a text-only file summarizing each base
   We store the fasta file as Sequence, the qual file as Quality and the scf file as Trace.
   It is believed that the Phred program is best, but who knows.

The sequence of called bases from these two files are not compatable.

The software keeps these two sequence regimes separate and is able to store all 5 of these files and deliver them to the user when requested.