interproscan的用法

评论5,336

直接iprscan是不行的,需要加个-cli才能正常运行,不然出现一大堆HTML的代码,这里的cli 是command-line interface 简写。

usage: ./iprscan -cli [-email <addr>] [-appl <name> ...] [-nocrc] [-altjobs] [-seqtype p|n] [-trlen <N>] [-trtable                                                                                             <table>]
              [-iprlookup] [-goterms] -i <seqfile> [-o <output file>]

 -i <seqfile>      Your sequence file (mandatory).
 -o <output file>  The output file where to write results (optional), default is STDOUT.
 -email <addr>     Submitter email address (required for non-interactive).
 -appl <name>      Application(s) to run (optional), default is all.
                   Possible values (dependent on set-up):
                            blastprodom
                            fprintscan
                            hamap
                            hmmpfam
                            hmmpir
                            hmmpanther
                            hmmtigr
                            hmmsmart
                            superfamily
                            gene3d
                            patternscan
                            profilescan
                            seg
                            coils
                            [tmhmm]
                            [signalp]
 -nocrc            Don't perform CRC64 check, default is on.
 -altjobs          Launch jobs alternatively, chunk after chunk. Default is off.
 -seqtype <type>   Sequence type: n for DNA/RNA, p for protein (default).
 -trlen <n>        Transcript length threshold (20-150).
 -trtable <table>  Codon table number.
 -goterms          Show GO terms if iprlookup option is also given.
 -iprlookup        Switch on the InterPro lookup for results.
 -format <format>  Output results format (raw, txt, html, xml(default), ebixml(EBI header on top of xml), gff)
 -verbose          Print messages during run

以上的大部分参数基本都是一目了然,
-appl 就是你想用的数据库,默认情况下是所有数据库都搜索,但是,比如我只想用pfam数据库,就可以"-appl hmmpfam", 如果你希望使用多个指定数据库,就多用几次-appl,例如“-appl hmmpfam -appl profilescan"
-iprlookup 会给出IPR自己的domain编号,例如IPR008237
-nocrc需要解释一下,默认情况下是开启了CRC匹配
interproscan的内建数据库已经包含了大量序列搜索的结果,就是如果你的蛋白序列已经包含在interpro的数据库里面,iprscan会直接把搜索结果给你,无需进行本地运算。但是呢,文档里这么说:
currently, InterPro does not store match information for the optional algorithms TMHMM or SignalP. Therefore, if you search a sequence using InterProScan and do not specify the "-nocrc" option (to force searching without doing the CRC calculation and look-up), only the matches that are in the InterPro database will returned and these will NOT include predictions from TMHMM or SignalP. Currently, InterProScan is unable to search these separately but we are working on a solution.
大意是如果使用默认参数,那些在数据里已有的蛋白,其TMHMM or SignalP的结果不会被报告,可能是数据库的license的问题,而且iprscan还无法单独进行TMHMM or SignalP的搜索以补充缺失的结果。
所以我一般还是使用-nocrc,虽然会慢点(很多数据库里已有的蛋白还得重新搜索),但是结果比较全。

发表评论

匿名网友