Difference between revisions of "Talk:Protein coding sequences"

(Alphabet soup)
Line 5: Line 5:
 
**The protein sequences are derived from the translation of the coding sequences (CDS) which have been submitted to the public nucleic acid database, the EMBL/GenBank/DDBJ database. All these sequences, as well as the related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
 
**The protein sequences are derived from the translation of the coding sequences (CDS) which have been submitted to the public nucleic acid database, the EMBL/GenBank/DDBJ database. All these sequences, as well as the related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
 
*[http://www.uniprot.org/help/uniref Uniref] - The UniRef databases provide clustered sets of sequences from UniProt Knowledgebase (including splice variants and isoforms) and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences (but not their descriptions) from view.
 
*[http://www.uniprot.org/help/uniref Uniref] - The UniRef databases provide clustered sets of sequences from UniProt Knowledgebase (including splice variants and isoforms) and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences (but not their descriptions) from view.
 +
*[http://www.genome.ad.jp/kegg/ KEGG] - they maintain their own database of proteins with KEGG numbers.  Proteins are grouped the function, structure, etc.
 +
 
===Papers===
 
===Papers===
 
*[http://www.biomedcentral.com/1471-2105/8/401 Biomed] paper about a protein identifier cross-referencing tool, [http://www.ebi.ac.uk/Tools/picr/ PCIR].
 
*[http://www.biomedcentral.com/1471-2105/8/401 Biomed] paper about a protein identifier cross-referencing tool, [http://www.ebi.ac.uk/Tools/picr/ PCIR].

Revision as of 16:20, 17 February 2009

Protein naming references

Alphabet soup

  • [http://uniprot.org Uniprot] - this is the umbrella organization of Swissprot and TrEMBL
  • [http://www.uniprot.org/help/uniprotkb UniprotKB] - this database is the superset of Swissprot (manually curated) and TrEMBL (autogenerated)
    • The protein sequences are derived from the translation of the coding sequences (CDS) which have been submitted to the public nucleic acid database, the EMBL/GenBank/DDBJ database. All these sequences, as well as the related data submitted by the authors, are automatically integrated into UniProtKB/TrEMBL.
  • [http://www.uniprot.org/help/uniref Uniref] - The UniRef databases provide clustered sets of sequences from UniProt Knowledgebase (including splice variants and isoforms) and selected UniParc records, in order to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences (but not their descriptions) from view.
  • [http://www.genome.ad.jp/kegg/ KEGG] - they maintain their own database of proteins with KEGG numbers. Proteins are grouped the function, structure, etc.

Papers

  • [http://www.biomedcentral.com/1471-2105/8/401 Biomed] paper about a protein identifier cross-referencing tool, [http://www.ebi.ac.uk/Tools/picr/ PCIR].
  • [http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18697767 Bioinformatics] paper about [http://llama.med.harvard.edu/synergizer/translate/ the synergizer] which translates identifiers from one database to another.


  • BBa_J31004 - change the vector of this part
  • BBa_Y00029 - remove the prefix and suffix
  • Rename chromatinremodeling to chromatin

Questions

  • How do we want to indicate relationships between parts? - i.e. protein coding sequences and generators, transcriptional regulators and cognate promoters, similar versions of protein coding sequences,

Categories

Part level function

  • //cds/reporter
    • //cds/reporter/fluorescentprotein
    • //cds/reporter/color
    • //cds/reporter/luciferase
  • //cds/selectionmarker/antibioticresistance
  • //cds/enzyme
    • //cds/enzyme/selectionmarker/antibioticresistance (//protein/enzyme/antibiotic_resistance)
    • //cds/enzyme/recombinase (//protein/enzyme/dna_excision_integration)
    • //cds/enzyme/biosynthesis/odorant (//protein/enzyme/smell)
    • //cds/enzyme/biosynthesis/AHL (//protein/enzyme/hsl_synthesis) --> synthases?
    • //cds/enzyme/protease
    • //cds/enzyme/degradation/AHL --> lyases?
    • //cds/enzyme/methylation --> should be split into methylases, demethylases
    • //cds/enzyme/phosphorylation --> should be split into phosphorylases, kinases, and phosphatases
  • //cds/transcriptionalregulator (//protein/repact/uncategorized)
    • //cds/transcriptionalregulator/activator
    • //cds/transcriptionalregulator/repressor
  • //cds/membrane
    • //cds/membrane/receptor
    • //cds/membrane/transporter
    • //cds/membrane/channel
    • //cds/membrane/pump
    • //cds/membrane/binding
    • //cds/membrane/surfacedisplay
  • //cds/binding
    • //cds/binding/DNA
    • //cds/binging/lead
  • //cds/chromatin

Device level function

  • //function/motility
  • //function/conjugation
  • //function/sensor
    • //function/sensor/light
    • //function/sensor/lead
    • //function/sensor/odor
  • //function/odor

Parameters

The following parameters can be auto-selected by the computer.

  • Completeness <-- need a better name here
    • Full-length
      • begins with ATG/GTG and ends with TAA/TGA/TAG
      • begins with TTA/TCA/CTA and ends with CAT/CAC
    • N-terminus
      • begins with ATG/GTG or ends with TAA/TGA/TAG
    • C-terminus
      • begins with TTA/TCA/CTA or ends with CAT/CAC
    • Fusion
    • Domain
      • has neither an ATG/GTG nor a TAA/TGA/TAG
  • Direction
    • Forward
      • begins with ATG/GTG or ends with TAA/TGA/TAG
    • Reverse
      • begins with TTA/TCA/CTA or ends with CAT/CAC
  • Degradation tag
    • probably ought to be able to auto-detect and annotate the different degradation tags as well. I'll compile the different sequences and tag names. May want to somehow tie this with the Protein tags and modifiers parts.
  • Signal sequences
    • do the same with N-terminal tags?

Columns

  • all protein coding sequences
    • Protein
    • Description
    • Direction (auto-generated)
    • Completeness (auto-generated)
    • Tags (auto-generated based on sequence search)
    • SwissProt
    • KEGG
    • Length
  • all enzymes
    • EC number
    • substrate
    • product
  • fluorescent_reporter_protein_coding_sequences
    • excitation
    • emission
    • color
  • luminescent_reporter_protein_coding_sequences
  • color_reporter_protein_coding_sequences
    • color
  • antibiotic_resistance_marker
    • antibiotic
  • recombination_protein_coding_sequences
    • recombination site
  • transcriptional_regulators
    • operator site sequence
    • ligand