PAML使用 – codeml的配置文件

seqfile = seqfile.phy * sequence data filename

treefile = tree.tre * tree structure file name

outfile = result.mlc * main result file name

*这三行分别替换为自己的文件:seqfile为序列文件,treefile为树文件,outfile为结果文件

 

noisy = 3 * 0,1,2,3,9: how much rubbish on the screen

verbose = 1 * 0: concise; 1: detailed, 2: too much

runmode = 0 * 0: user tree; 1: semi-automatic; 2: automatic

* 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

*这三行让使用者决定电脑处理数据的方式,一般不用改动。如果只两两序列比对(pairwise), runmode为-2,而不需要树文件.

 

seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs

CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table

*F3x4: codon frequencies are calculated from average nucleotide frequencies at the three codon positions.

*ndata = 10

clock = 0 * 0: no clock, 1:clock; 2:local clock; 3:CombinedAnalysis

aaDist = 0 * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a

aaRatefile = dat/mtArt.dat * only used for aa seqs with model=empirical(_F)

* dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

*seqtype当序列为DNA是1,而protein时为2,本例为DNA。其它几行一般不用改。

 

model = 1

* models for codons:

* 0:one, 1:b, 2:2 or more dN/dS ratios for branches

* models for AAs or codon-translated AAs:

* 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F

* 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

*在使用branch models或branch-site models时需要修改这个数值。

* 0:one omega ratio for all branches; 1:separate omega for each branch; 2:user specified dN/dS ratios for branches

 

*以下几个参数的详细设定方式请参考程序使用手册中的codon substitution models的部分?

NSsites = 0 * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;

* 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ

* 10:betaγ+1; 11:beta&normal>1; 12:0&2normal>1;

* 13:3normal>0

*这个参数改变的是site models的类型,请注意不同的model在不同的NSsites条件下,结果中出现

*的p,ω等表示序列变化的参数数目与意义也不一样。至于哪一组数值可信,需要使用likelihood ratio test作检验。

 

icode = 0 * 0:universal code; 1:mammalian mt; 2-10:see below

Mgene = 0

* codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff

* AA: 0:rates, 1:separate

*这里的参数原则上也不需要改动,但是注意icode会受DNA序列的类型而对电脑的运算方式产生影响。

 

fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated

kappa = 2 * initial or fixed kappa

fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate

omega = .4 * initial or fixed omega, for codons or codon-based AAs

*PAML会要求使用者决定kappa & omega的起始值。

* kappa: ts/tv, the transition/transversion rate ratio; omega: dN/dS

 

fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha

alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)

Malpha = 0 * different alphas for genes

ncatG = 8 * # of categories in dG of NSsites models

getSE = 0 * 0: don't want them, 1: want S.E.s of estimates

*如果需要标准差就把getSE设定为1.

RateAncestor = 1 * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

Small_Diff = .5e-6

cleandata = 1 * remove sites with ambiguity data (1:yes, 0:no)?

*如果序列中有GAP等机器无法读取的部分就设定为1.

*fix_blength = -1 * 0: ignore, -1: random, 1: initial, 2: fixed

method = 0 * Optimization method 0: simultaneous; 1: one branch a time

 

* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,

* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,

* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt., 10: blepharisma nu.

* These codes correspond to transl_table 1 to 11 of GENEBANK.

 

==================

:) I write some simple ways to test likelihood ratio test on variable codon rates. These two models assume codon rates varied among all branchs.

1. M1a vs. M2a:   M1a(model=0, NSsites=1), M2a( model=0, NSsites=2), the df = 2, LRT = 2dl = abs(2 X (l1- l0)).

2. M7 vs. M8:   M1a(model=0, NSsites=7), M2a( model=0, NSsites=8), the df = 2, LRT = 2dl = abs(2 X (l1-l0)).

The chi-square value can be calucated by PROGRAM CHI2.EXE in PAML.  If the p-value <0.05, then we can conclude that some sites are under positive selection.

原文来自:http://csbl.bmb.uga.edu/~yinyb/codeml.html

    • Deducter 0

      有没有蛋白质的例子啊,今天试了一个AAS的,但是就是得不到omega的结果,难受。。

    发表评论

    匿名网友

    拖动滑块以完成验证