Odds ratio(OR)从字面上可看出,是两个odds的ratio,其用于:
与其相似的有个指标relative risk(RR),其可以理解为risk ratio,用于:
以一个例子来说明两者的区别,数据表格如下(Mutated gene对应暴露风险因素,Cancer对应疾病):
则 从上可看出,OR表明暴露组的疾病风险程度是非暴露组的6.88倍,RR表明暴露组发病的风险是非暴露组的5.91倍 OR值的统计学意义:
RR值的统计学意义:
注意点:
为什么在病例对照研究(case-control study)中无法计算RR值?
Odds ratio(OR)的计算方法StatQuest教程中StatQuest: Odds Ratios and Log(Odds Ratios)这节讲到了如何计算OR值以及P值(statistical significance),大致可以分为3种方法:
以上述数据表格为例: dat <- matrix(c(23, 6, 117, 210), nrow = 2, ncol = 2) rownames(dat) <- c("Mutated gene", "No mutated gene") colnames(dat) <- c("Cancer", "Normal") Fisher’s Exact Test使用 > fisher.test(dat) Fisher's Exact Test for Count Data data: dat p-value = 1.099e-05 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 2.613152 21.139349 sample estimates: odds ratio 6.842952 Chi-Square Test使用 > chisq.test(dat, correct = F) Pearson's Chi-squared test data: dat X-squared = 21.154, df = 1, p-value = 4.237e-06 epitools package如果想同时看Fisher’s Exact Test和Chi-Square Test的结果,并计算OR值的话,可以考虑用 dat2 <- matrix(c(6, 23, 210, 117), nrow = 2, ncol = 2) rownames(dat2) <- c("No mutated gene", "Mutated gene") colnames(dat2) <- c("Normal", "Cancer") library(epitools) > epitools::oddsratio(dat2, correction = F, rev = "c") $data Cancer Normal Total No mutated gene 210 6 216 Mutated gene 117 23 140 Total 327 29 356 $measure NA odds ratio with 95% C.I. estimate lower upper No mutated gene 1.000000 NA NA Mutated gene 6.717846 2.805078 18.87268 $p.value NA two-sided midp.exact fisher.exact chi.square No mutated gene NA NA NA Mutated gene 6.572274e-06 1.098703e-05 4.237152e-06 $correction [1] FALSE attr(,"method") [1] "median-unbiased estimate & mid-p exact CI" 其同样也可以计算RR值 epitools::riskratio(dat2, correction = F, rev = "c") fmsb package还可以用 library(fmsb) > fmsb::oddsratio(dat) Disease Nondisease Total Exposed 23 117 140 Nonexposed 6 210 216 Total 29 327 356 Odds ratio estimate and its significance probability data: dat p-value = 4.371e-06 95 percent confidence interval: 2.724202 17.377236 sample estimates: [1] 6.880342 logistic regression
对于Odd Ratios在Logistic regression中的理解可以看:
通过 data <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sample.csv") > head(data) female read write math hon femalexmath predicted predicted2 1 0 57 52 41 0 0 -1.4708517 -3.3839875 2 1 68 59 53 0 53 -0.8780695 -1.5079033 3 0 44 33 54 0 0 -1.4708517 -1.3515629 4 0 63 44 47 0 0 -1.4708517 -2.4459454 5 0 47 52 57 0 0 -1.4708517 -0.8825418 6 0 44 52 51 0 0 -1.4708517 -1.8205840 f1<-glm(hon~female,data = data,family = binomial) # summary(f1)$coeff > summary(f1) Call: glm(formula = hon ~ female, family = binomial, data = data) Deviance Residuals: Min 1Q Median 3Q Max -0.8337 -0.8337 -0.6431 -0.6431 1.8317 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.4709 0.2690 -5.469 4.53e-08 *** female 0.5928 0.3414 1.736 0.0825 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 222.71 on 199 degrees of freedom Residual deviance: 219.61 on 198 degrees of freedom AIC: 223.61 Number of Fisher Scoring iterations: 4 从上可看出,每一单位female的变化(在此例子中相当于从0变成1),hon的log adds增加0.5928,即回归系数(logistic regression coefficients) 查看回归系数以及对应的显著性P值(默认是用) coef(summary(g))["g2",c("Estimate","Pr(>|z|)")] 从回归系数可计算出OR值(1.8090145)以及置信区间(0.9362394 - 3.5929859) > exp(cbind(OR = coef(f1), confint(f1))) Waiting for profiling to be done... OR 2.5 % 97.5 % (Intercept) 0.2297297 0.1312460 0.3792884 female 1.8090145 0.9362394 3.5929859 # confint.default(f1) 按照公式,OR值也可以手动计算: data$predicted<-predict(f1) # Calculate log odds s1 <-data$predicted[data$female==0][1] s2 <-data$predicted[data$female==1][1] odd_ratio<-exp(s1-s2) predict probabilities从公式上可得是 exp(s2)/(1 + exp(s2)) # exp(s1)/(1 + exp(s1)) 或者通过下述函数也可直接出结果 predict(f1, type = "response") 绘制拟合曲线散点图(这个示例数据不太合适展示,拟合效果有点差,因此不展示了) # f2<-glm(hon~math,data = data,family = binomial) # # library(dplyr) # dt <-data %>% # group_by(math,hon) %>% # summarise(freq=n()) %>% # mutate(all=sum(freq),prob=freq/all,odds=prob/(1-prob),logodds=log(odds)) %>% # round(.,5) # # data$fit <- predict(f2, data, type = "response") # # dt <- left_join(dt, data[,c("math", "fit")]) # library(ggplot2) # ggplot(dt, aes(x=math, y=prob)) + # geom_point() + # geom_line(aes(x=math, y=fit)) 参考资料 |