Fasta格式说明

序列的Fasta格式是最经常看到的格式之一。下面简介说明一下什么是FASTA格式。
Fasta格式开始于一个标识符:">",然后是一行描述,下面是一行行的序列。每一行最好不要超过80个字母。
如:

  1. >gi|532319|pir|TVFV2E|TVFV2E envelope protein
  2. ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
  3. QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
  4. HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
  5. MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
  6. TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
  7. APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
  8. LAAVEAQQQMLKLTIWGVK

下面再说一下每个字母或字符所代表的含义。
核苷酸序列:

  1. A --> adenosine M --> A C (amino)
  2. C --> cytidine S --> G C (strong)
  3. G --> guanine W --> A T (weak)
  4. T --> thymidine B --> G T C
  5. U --> uridine D --> G A T
  6. R --> G A (purine) H --> A C T
  7. Y --> T C (pyrimidine) V --> G C A
  8. K --> G T (keto) N --> A G C T (any)
  9. - gap of indeterminate length

氨基酸序列:

  1. A alanine P proline
  2. B aspartate or asparagine Q glutamine
  3. C cystine R arginine
  4. D aspartate S serine
  5. E glutamate T threonine
  6. F phenylalanine U selenocysteine
  7. G glycine V valine
  8. H histidine W tryptophan
  9. I isoleucine Y tyrosine
  10. K lysine Z glutamate or glutamine
  11. L leucine X any
  12. M methionine * translation stop
  13. N asparagine - gap of indeterminate length

发表评论

匿名网友

拖动滑块以完成验证
加载失败