利用BioJava列出序列中的注释

当你读取象GenBank或EMBL这样的序列注释文件时,文件提供的不仅仅是序列本身还有一些更细节的序列信息。如果这个信息拥有位置的话,就可以当作是特征。如果这个信息是很通用的信息比如说是物种名称的话,就可以当作是注释。Biojava 注释(Annotation)对象有些象图(Map)对象,它包括键值映射。

下面就是EMBL文件的开头部分:

  1. ID AY130859 standard; DNA; HUM; 44226 BP.
  2. XX
  3. AC AY130859;
  4. XX
  5. SV AY130859.1
  6. XX
  7. DT 25-JUL-2002 (Rel. 72, Created)
  8. DT 25-JUL-2002 (Rel. 72, Last updated, Version 1)
  9. XX
  10. DE Homo sapiens cyclin-dependent kinase 7 (CDK7) gene, complete cds.
  11. XX
  12. KW .
  13. XX
  14. OS Homo sapiens (human)
  15. OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
  16. OC Eutheria; Primates; Catarrhini; Hominidae; Homo.
  17. XX
  18. RN [1]
  19. RP 1-44226
  20. RA Rieder M.J., Livingston R.J., Braun A.C., Montoya M.A., Chung M.-W.,
  21. RA Miyamoto K.E., Nguyen C.P., Nguyen D.A., Poel C.L., Robertson P.D.,
  22. RA Schackwitz W.S., Sherwood J.K., Witrak L.A., Nickerson D.A.;
  23. RT ;
  24. RL Submitted (11-JUL-2002) to the EMBL/GenBank/DDBJ databases.
  25. RL Genome Sciences, University of Washington, 1705 NE Pacific, Seattle, WA
  26. RL 98195, USA
  27. XX
  28. CC To cite this work please use: NIEHS-SNPs, Environmental Genome
  29. CC Project, NIEHS ES15478, Department of Genome Sciences, Seattle, WA
  30. CC (URL: http://egp.gs.washington.edu).

下面的程序读取EMBL文件并且列出所有的注释属性。程序的输出附在后面。
[code lang="java"]
import java.io.*;
import java.util.*;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.io.*;

public class ListAnnotations {
public static void main(String[] args){
try {
// 读取EMBL纪录

BufferedReader br = new BufferedReader(new FileReader(args[0]));
SequenceIterator seqs = SeqIOTools.readEmbl(br);

// 对于每条序列列出它的注释

while(seqs.hasNext()){
Annotation anno = seqs.nextSequence().getAnnotation();

// 打印每个键值对

for(Iterator i = anno.keys().iterator();i.hasNext(); ) {
Object key = i.next();
System.out.println(key+" : "+ anno.getProperty(key));
}
}
}
catch (Exception ex){
ex.printStackTrace();
}
}
}
[/code]

程序输出:

  1. RN : [1]
  2. KW : .
  3. RL : [Submitted (11-JUL-2002) to the EMBL/GenBank/DDBJ databases., Genome Sciences, University of Washington, 1705 NE Pacific, Seattle, WA, 98195, USA]
  4. embl_accessions : [AY130859]
  5. DE : Homo sapiens cyclin-dependent kinase 7 (CDK7) gene, complete cds.
  6. SV : AY130859.1
  7. AC : AY130859;
  8. FH : Key Location/Qualifiers
  9. XX :
  10. OC : [Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;, Eutheria; Primates; Catarrhini; Hominidae; Homo.]
  11. RA : [Rieder M.J., Livingston R.J., Braun A.C., Montoya M.A., Chung M.-W.,, Miyamoto K.E., Nguyen C.P., Nguyen D.A., Poel C.L., Robertson P.D.,, Schackwitz W.S., Sherwood J.K., Witrak L.A., Nickerson D.A.;]
  12. ID : AY130859 standard; DNA; HUM; 44226 BP.
  13. DT : [25-JUL-2002 (Rel. 72, Created), 25-JUL-2002 (Rel. 72, Last updated, Version 1)]
  14. CC : [To cite this work please use: NIEHS-SNPs, Environmental Genome, Project, NIEHS ES15478, Department of Genome Sciences, Seattle, WA, (URL: http://egp.gs.washington.edu).]
  15. RT : ;
  16. OS : Homo sapiens (human)
  17. RP : 1-44226

发表评论

匿名网友

拖动滑块以完成验证
加载失败