序列的Fasta格式是最经常看到的格式之一。下面简介说明一下什么是FASTA格式。
Fasta格式开始于一个标识符:">",然后是一行描述,下面是一行行的序列。每一行最好不要超过80个字母。
如:
- >gi|532319|pir|TVFV2E|TVFV2E envelope protein
- ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
- QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
- HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
- MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
- TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
- APTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
- LAAVEAQQQMLKLTIWGVK
下面再说一下每个字母或字符所代表的含义。
核苷酸序列:
- A --> adenosine M --> A C (amino)
- C --> cytidine S --> G C (strong)
- G --> guanine W --> A T (weak)
- T --> thymidine B --> G T C
- U --> uridine D --> G A T
- R --> G A (purine) H --> A C T
- Y --> T C (pyrimidine) V --> G C A
- K --> G T (keto) N --> A G C T (any)
- - gap of indeterminate length
氨基酸序列:
- A alanine P proline
- B aspartate or asparagine Q glutamine
- C cystine R arginine
- D aspartate S serine
- E glutamate T threonine
- F phenylalanine U selenocysteine
- G glycine V valine
- H histidine W tryptophan
- I isoleucine Y tyrosine
- K lysine Z glutamate or glutamine
- L leucine X any
- M methionine * translation stop
- N asparagine - gap of indeterminate length