计算OR值(odds ratio、比值比、优势比)

来源:Kai评论36,095

Odds ratio(OR)从字面上可看出,是两个odds的ratio,其用于:

在病例对照研究(case-control study)中,分析暴露风险因素与疾病(或者用药)的关联程度;主要是反映暴露与疾病之间关联强度的指标,OR常适用于病例对照研究,也可以运用于前瞻性的研究(当观察时间相等时)

与其相似的有个指标relative risk(RR),其可以理解为risk ratio,用于:

在队列研究(cohort study)中,分析暴露因素与发病的关联程度;主要是反映暴露与发病(死亡)关联强度的最有用的指标,RR适用于队列研究或随机对照试验。

以一个例子来说明两者的区别,数据表格如下(Mutated gene对应暴露风险因素,Cancer对应疾病):

CancerNormalTotal
Mutated gene23117140
No mutated gene6210216
Total29327356

OR = (23/117) / (6/210) = 6.88RR = (23/140) / (6/216) = 5.91

从上可看出,OR表明暴露组的疾病风险程度是非暴露组的6.88倍,RR表明暴露组发病的风险是非暴露组的5.91倍

OR值的统计学意义:

  • OR>1,暴露与疾病的危险度增加,两者呈正相关
  • OR<1,暴露与疾病的危险度减少,两者呈负相关
  • OR=1,暴露与疾病的危险度无关,两者呈不相关

RR值的统计学意义:

  • OR>1,暴露因素是疾病的危险因素,两者呈正相关
  • OR<1,暴露因素是疾病的保护因素,两者呈负相关
  • OR=1,暴露因素与疾病无关,两者呈不相关

注意点:

当疾病的incidence rate较低时,OR近似于RR,故当疾病很罕见时,常用OR来作为RR的近似值;然而当incidence rate高于10%的时候,OR与RR的差距会变得越来越大,从而使得在这些情况下使用OR就变得并不那么合适了(OR会倾向于给出一个暴露 vs. 非暴露间差距更明显的值,因此导致临床意义不足)

为什么在病例对照研究(case-control study)中无法计算RR值?

因为我们一开始选定的人群是基于他们发没发生event来定的,所以这时候我们这个研究群体里的的incidence rate并不是target population里真实的incidence rate (事实上,case-control study里的incidence rate一般会远大于实际的incidence rate,因为做case-control study的初衷就是因为target population里的event rate太低),所以我们没法计算RR


Odds ratio(OR)的计算方法

StatQuest教程中StatQuest: Odds Ratios and Log(Odds Ratios)这节讲到了如何计算OR值以及P值(statistical significance),大致可以分为3种方法:

  • Fisher’s Exact Test
  • Chi-Square Test
  • The Wald Test (对应常用的logistic regression)

以上述数据表格为例:

  1. dat <- matrix(c(23, 6, 117, 210), nrow = 2, ncol = 2)
  2. rownames(dat) <- c("Mutated gene", "No mutated gene")
  3. colnames(dat) <- c("Cancer", "Normal")
Fisher’s Exact Test

使用fisher.test函数即可计算P值及OR值,以及置信区间

  1. > fisher.test(dat)
  2.  
  3. Fisher's Exact Test for Count Data
  4.  
  5. data: dat
  6. p-value = 1.099e-05
  7. alternative hypothesis: true odds ratio is not equal to 1
  8. 95 percent confidence interval:
  9. 2.613152 21.139349
  10. sample estimates:
  11. odds ratio
  12. 6.842952
Chi-Square Test

使用chisq.test函数,不对P值做校正的话,加上correct = F参数

  1. > chisq.test(dat, correct = F)
  2.  
  3. Pearson's Chi-squared test
  4.  
  5. data: dat
  6. X-squared = 21.154, df = 1, p-value = 4.237e-06
epitools package

如果想同时看Fisher’s Exact Test和Chi-Square Test的结果,并计算OR值的话,可以考虑用epitools包(注意原始输入数据的格式,需要先翻转下),如:

  1. dat2 <- matrix(c(6, 23, 210, 117), nrow = 2, ncol = 2)
  2. rownames(dat2) <- c("No mutated gene", "Mutated gene")
  3. colnames(dat2) <- c("Normal", "Cancer")
  4.  
  5. library(epitools)
  6. > epitools::oddsratio(dat2, correction = F, rev = "c")
  7. $data
  8. Cancer Normal Total
  9. No mutated gene 210 6 216
  10. Mutated gene 117 23 140
  11. Total 327 29 356
  12.  
  13. $measure
  14. NA
  15. odds ratio with 95% C.I. estimate lower upper
  16. No mutated gene 1.000000 NA NA
  17. Mutated gene 6.717846 2.805078 18.87268
  18.  
  19. $p.value
  20. NA
  21. two-sided midp.exact fisher.exact chi.square
  22. No mutated gene NA NA NA
  23. Mutated gene 6.572274e-06 1.098703e-05 4.237152e-06
  24.  
  25. $correction
  26. [1] FALSE
  27.  
  28. attr(,"method")
  29. [1] "median-unbiased estimate & mid-p exact CI"

其同样也可以计算RR值

  1. epitools::riskratio(dat2, correction = F, rev = "c")
fmsb package

还可以用fmsb包计算OR值及置信区间(跟SAS结果一致。。。)

  1. library(fmsb)
  2. > fmsb::oddsratio(dat)
  3. Disease Nondisease Total
  4. Exposed 23 117 140
  5. Nonexposed 6 210 216
  6. Total 29 327 356
  7.  
  8. Odds ratio estimate and its significance probability
  9.  
  10. data: dat
  11. p-value = 4.371e-06
  12. 95 percent confidence interval:
  13. 2.724202 17.377236
  14. sample estimates:
  15. [1] 6.880342
logistic regression

logistic regression,即假设error terms服从binomial distribution,并使用logit作为link function;然后通过model计算出变量对应的logit(p),即logodds,odds则是等于exp(logodds),而p(predict probabilities )则是odds/(1+odds)

对于Odd Ratios在Logistic regression中的理解可以看:

通过glm函数对数据进行拟合(观察female变量与hon之间的影响)

  1. data <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sample.csv")
  2. > head(data)
  3. female read write math hon femalexmath predicted predicted2
  4. 1 0 57 52 41 0 0 -1.4708517 -3.3839875
  5. 2 1 68 59 53 0 53 -0.8780695 -1.5079033
  6. 3 0 44 33 54 0 0 -1.4708517 -1.3515629
  7. 4 0 63 44 47 0 0 -1.4708517 -2.4459454
  8. 5 0 47 52 57 0 0 -1.4708517 -0.8825418
  9. 6 0 44 52 51 0 0 -1.4708517 -1.8205840
  10.  
  11. f1<-glm(hon~female,data = data,family = binomial)
  12. # summary(f1)$coeff
  13. > summary(f1)
  14.  
  15. Call:
  16. glm(formula = hon ~ female, family = binomial, data = data)
  17.  
  18. Deviance Residuals:
  19. Min 1Q Median 3Q Max
  20. -0.8337 -0.8337 -0.6431 -0.6431 1.8317
  21.  
  22. Coefficients:
  23. Estimate Std. Error z value Pr(>|z|)
  24. (Intercept) -1.4709 0.2690 -5.469 4.53e-08 ***
  25. female 0.5928 0.3414 1.736 0.0825 .
  26. ---
  27. Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 1
  28.  
  29. (Dispersion parameter for binomial family taken to be 1)
  30.  
  31. Null deviance: 222.71 on 199 degrees of freedom
  32. Residual deviance: 219.61 on 198 degrees of freedom
  33. AIC: 223.61
  34.  
  35. Number of Fisher Scoring iterations: 4

从上可看出,每一单位female的变化(在此例子中相当于从0变成1),hon的log adds增加0.5928,即回归系数(logistic regression coefficients)

查看回归系数以及对应的显著性P值(默认是用)

  1. coef(summary(g))["g2",c("Estimate","Pr(>|z|)")]

从回归系数可计算出OR值(1.8090145)以及置信区间(0.9362394 - 3.5929859)

  1. > exp(cbind(OR = coef(f1), confint(f1)))
  2. Waiting for profiling to be done...
  3. OR 2.5 % 97.5 %
  4. (Intercept) 0.2297297 0.1312460 0.3792884
  5. female 1.8090145 0.9362394 3.5929859
  6. # confint.default(f1)

按照公式,OR值也可以手动计算:

  1. data$predicted<-predict(f1)
  2. # Calculate log odds
  3. s1 <-data$predicted[data$female==0][1]
  4. s2 <-data$predicted[data$female==1][1]
  5. odd_ratio<-exp(s1-s2)

predict probabilities从公式上可得是odds/(1+odds),从上述可的female变量对应的log odds,然后转化成odds后即可计算,如:

  1. exp(s2)/(1 + exp(s2))
  2. # exp(s1)/(1 + exp(s1))

或者通过下述函数也可直接出结果

  1. predict(f1, type = "response")

绘制拟合曲线散点图(这个示例数据不太合适展示,拟合效果有点差,因此不展示了)

  1. # f2<-glm(hon~math,data = data,family = binomial)
  2. #
  3. # library(dplyr)
  4. # dt <-data %>%
  5. # group_by(math,hon) %>%
  6. # summarise(freq=n()) %>%
  7. # mutate(all=sum(freq),prob=freq/all,odds=prob/(1-prob),logodds=log(odds)) %>%
  8. # round(.,5)
  9. #
  10. # data$fit <- predict(f2, data, type = "response")
  11. #
  12. # dt <- left_join(dt, data[,c("math", "fit")])
  13. # library(ggplot2)
  14. # ggplot(dt, aes(x=math, y=prob)) +
  15. # geom_point() +
  16. # geom_line(aes(x=math, y=fit))

参考资料

发表评论

匿名网友

拖动滑块以完成验证
加载失败