
臨床大數(shù)據(jù)研究系列文獻(xiàn)分享第5篇,由浙江大學(xué)章仲恒老師撰寫的臨床大數(shù)據(jù)系列專欄文章發(fā)表在 Annals of Translational雜志,這篇文章主要介紹的是介紹Logistic回歸模型的構(gòu)建策略。這里只做學(xué)習(xí)交流,版權(quán)歸原作者所有。
摘要
Logistic回歸是解決醫(yī)學(xué)文獻(xiàn)中混雜因素的最常用模型之一。本文介紹了如何使用R執(zhí)行有目的的選擇模型構(gòu)建策略。作者著重于介紹使用似然比檢驗(yàn)來查看刪除變量是否會對模型擬合產(chǎn)生重大影響。還應(yīng)檢查已刪除的變量,以確定它是否對剩余協(xié)變量的重要調(diào)整。應(yīng)檢查交互作用,以弄清協(xié)變量之間的復(fù)雜關(guān)系及其對響應(yīng)變量的協(xié)同作用。應(yīng)該檢查模型的擬合優(yōu)度 goodness-of-fit(GOF)。換句話說,擬合模型如何反映真實(shí)數(shù)據(jù)。 Hosmer-Lemeshow GOF檢驗(yàn)是用于Logistic回歸模型的最廣泛的檢驗(yàn)。
介紹
Logistic回歸模型是研究變量對醫(yī)學(xué)文獻(xiàn)中二項(xiàng)式結(jié)果的獨(dú)立影響的最廣泛使用的模型之一。但是,許多研究并未明確提出模型建立策略,從而損害了結(jié)果的可靠性和可重復(fù)性。文獻(xiàn)中報(bào)道了多種模型構(gòu)建策略,例如有目的地選擇變量,逐步選擇和最佳子集。但是,究竟哪一種方法好還沒有被證明,也不得而知,模型構(gòu)建策略是“部分科學(xué),部分統(tǒng)計(jì)方法以及部分經(jīng)驗(yàn)和常識”。模型構(gòu)建的原理是選擇盡可能少的變量,但是模型(簡約模型)仍然反映了數(shù)據(jù)的真實(shí)結(jié)果。在本文中,作者介紹了如何在R中執(zhí)行有目的的選擇。變量選擇是模型構(gòu)建的第一步。其他步驟將在后續(xù)文章中介紹。
附上原文







參考文獻(xiàn)
Cite this article as: Zhang Z. Model building strategy for logistic regression: purposeful selection. Ann Transl Med 2016;4(6):111. doi: 10.21037/atm.2016.02.15
Bursac Z, Gauss CH, Williams DK, et al. Purposeful selection of variables in logistic regression. Source Code Biol Med 2008;3:17. [Crossref] [PubMed]
Greenland S. Modeling and variable selection in epidemiologic analysis. Am J Public Health 1989;79:340-9. [Crossref] [PubMed]
Model-building strategies and methods for logistic regression. In: Hosmer DW Jr, Lemeshow S, Sturdivant RX. Applied logistic regression. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2000;63.
Zhang Z, Chen K, Ni H, et al. Predictive value of lactate in unselected critically ill patients: an analysis using fractional polynomials. J Thorac Dis 2014;6:995-1003. [PubMed]
Zhang Z, Ni H. Normalized lactate load is associated with development of acute kidney injury in patients who underwent cardiopulmonary bypass surgery. PLoS One 2015;10:e0120466. [Crossref] [PubMed]
Zhang Z, Xu X. Lactate clearance is a useful biomarker for the prediction of all-cause mortality in critically ill patients: a systematic review and meta-analysis*. Crit Care Med 2014;42:2118-25. [Crossref] [PubMed]
Kabacoff R. R in action. Cherry Hill: Manning Publications Co; 2011.
Bendal RB, Afifi AA. Comparison of stopping rules in forward regression. Journal of the American Statistical Association 1977;72:46-53.
Mickey RM, Greenland S. The impact of confounder selection criteria on effect estimation. Am J Epidemiol 1989;129:125-37. [PubMed]
Royston P, Ambler G, Sauerbrei W. The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 1999;28:964-74. [Crossref] [PubMed]
Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Applied Statistics 1994;43:429-67. [Crossref]
Hosmer DW, Hjort NL. Goodness-of-fit processes for logistic regression: simulation results. Stat Med 2002;21:2723-38. [Crossref] [PubMed]