我的研究需要多大样本量?我的研究样本量已经有了,有多大概率可以得出有统计学意义的统计结果(这个样本量值得去做研究吗)?这些问题都可以通过功效分析(Power Analysis)来解决。
要进行功效分析,先要了解一下分析中涉及4个统计量:样本量(Sample Size)、效应值(Effect Size)、显著水准(Alpha)、功效(Power),知其三个可推断另外一个。
Ⅰ型错误是拒绝了实际上成立的H0,是一种“弃真”行为,显著水平(检验水平)α实际上就是预先规定的犯Ⅰ型错误的最大概率,常取值0.05、0.01,Ⅰ型错误也用α表示,α可以取单尾也可以取双尾。如α=0.05,当H0实际成立却通过假设检验拒绝了H0时,理论上100次试验有5次会发生这种错误。医学上的假阳性、误诊便属于此类错误。Ⅱ型错误是没有拒绝(“接受”)实际上不成立的H0,是一种“取伪”行为,其概率大小用β表示,β只取单尾,医学上的假阴性、漏诊属于此类错误。形象地讲,Ⅰ类错误是放弃了正确的事实,错杀了好人;Ⅱ类错误选择了错误,放走了坏人。来自知乎的文章【https://www.zhihu.com/question/20993864/answer/30760554】对这两类错误有很形象地诠释:H0:you are not pregnant;H1:you are pregnant。
1-β称为检验功效(power of test),也称把握度,表示当两总体确实有差异,按规定的检验水准α所能发现该差异的能力。比如1-β=0.9,则表示若两总体确有差异,理论上100次检验中有90次能够得出差异有统计学意义的结论。
不同的统计学方法都有确定样本含量的方法,不同的软件使用的计算公式也可能会有差异。本次笔记以两独立样本的t检验进行演示,涉及软件PASS、STATA以及R。
PASS(Power Analysis & Sample Size) 是世界范围内样本量计算的引领者,2019版提供了920个检验程序,可以实现各种统计方法的样本量和功效分析,软件自称是市场上最好的样本量工具。由于笔者只有试用版,alpha=0.15,CI level=85%,且无法修改,PASS操作只能做演示。
PASS界面如下,可通过菜单栏的程序菜单、程序目录、关键搜索来寻找需要的程序。
通过程序目录依次点击:Means >> Two Independent Means >> T-Test(Inequality),相关程序窗口显示,两独立样本的t检验相关的程序共有14个,选择Two-Sample T-Tests Assuming Equal Variance并双击打开。可在该程序上点击右键,选择“Procedure Documentation”查看帮助系统。帮助系统很强大,程序的用途、计算公式、注意事项以及每一步操作都有详细介绍,是最好PASS的教材,可根据试验设计和采用的统计方法选择合适的程序。
设计选项
Solve For ............................Sample Size.本例是计算样本量;
Alternative Hypothesis ......Two-Sided.本例双尾检验;
Power....................................0.90.功效:可同时考察几个,直接输入即可,不同的功效值间用空格隔开;
Alpha....................................应该为0.05,可同时考察多个。本人只有试用版,该项默认0.15,不能修改,本例只能做操作上的演示;
Group Allocation ................Equal (N1 = N2).组间样本量关系:有多个样式可选,是按一定比例还是相等。本例选择相等;
Input Type............................Difference.
δ............................................1.5
σ............................................2.5
选择需要输入的效应量,本例选择差值,容差的输入时通过直接输入两样本的差值还是输入两个样本,另外输入标准差。容差和标准差可同时输入多个同时考察;
完成上述操作后点击左上角的[计算]。
同时输入多个参数,除了生成需要的功效数据,还可生成图片进行直观判断。
【2】STATA软件操作
统计>>效能,精度和样品含量:两个独立样本均值检验,在打开的双样本均值检验的效能检验对话框中,输入相应的条件。
R软件中众多的程序包可以实现不同研究设计的功效分析和样本量计算。
本例为两独立样本的t检验的样本量计算,可使用函数power.t {powerAnalysis}、pwr.t.test {pwr}、power.t.test {stats}。
power.t(es=NULL,n=NULL,power=NULL,sig.level=NULL,ratio=1,type=c("two","paired","one","unequal"),alternative=c("two.sided","left","right"))
es:effect size,difference between the means divided by the pooled standard deviation;
n:total number of observations/pairs;
power:power of study;
sig.level:significance level;
ratio:the ratio of sample size 1 to sample size 2. Only will be used when 'type' is "unequal";
type:type of t test, must be one of "one","two" (default), "paired", or "unequal". "one" means one sample t test, which test whether the population mean is equal to a specified value. "two"/"unequal" means two sample (equal size/unequal size) t test, which is used to ascertain how likely an observed mean difference between two groups would be to occur by chance alone. "paired" means paired t-test (also called the correlated t-test and the t-test for dependent means), which is used to ascertain how likely the difference between two means that contain the same (or matched) observations is to occur by chance alone;
alternative:One- or two-sided test, must be one of "two.sided" (default), "left", "right".
pwr.t.test(n =NULL,d=NULL,sig.level=0.05, power =NULL,type=c("two.sample","one.sample","paired"),alternative=c("two.sided","less","greater"))
n:Number of observations (per sample);
d:Effect size (Cohen's d) - difference between the means divided by the pooled standard deviation;
sig.level:Significance level (Type I error probability);
power:Power of test (1 minus Type II error probability);
type:Type of t test : one- two- or paired-samples;
alternative:a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less.
power.t.test(n=NULL,delta=NULL,sd =1,sig.level=0.05,power = NULL,type=c("two.sample","one.sample","paired"),alternative=c("two.sided","one.sided"),strict=FALSE,tol=.Machine$double.eps^0.25)
n:number of observations (per group);
delta:true difference in means;
sd:standard deviation;
sig.level:significance level (Type I error probability);
power:power of test (1 minus Type II error probability);
type:string specifying the type of t test. Can be abbreviated;
alternative:one- or two-sided test. Can be abbreviated;
strict:use strict interpretation in two-sided case;
tol:numerical tolerance used in root finding, the default providing (at least) four significant digits.
三种方法的R命令清单如下:
library(powerAnalysis) #加载powerAnalysis程序包
power.t(es=1.5/2.5,power=0.9,sig.level=0.05,type="two",alternative="two.sided") #powerAnalysis程序包中的power.t函数进行计算样本量。es、n、power、sig.level四个统计量可知其三求其一
pwr.t.test(d=1.5/2.5,power=0.9,sig.level=0.05,type="two.sample",alternative="two.sided")
power.t.test(power =0.90,delta=1.5,sd=2.5)