數(shù)據(jù)挖掘技術(shù)在醫(yī)學(xué)數(shù)據(jù)中的應(yīng)用
中文摘要
隨著大數(shù)據(jù)技術(shù)與人工智能技術(shù)的發(fā)展,數(shù)據(jù)挖掘技術(shù)被應(yīng)用在越來越多的領(lǐng)域之中,其中不乏金融、教育、醫(yī)療等行業(yè)。其中,在醫(yī)療行業(yè)的應(yīng)用上又包括精準(zhǔn)醫(yī)療、基因工程、基因測序等學(xué)科前沿領(lǐng)域中。本文則是以數(shù)據(jù)挖掘的模型算法在醫(yī)學(xué)臨床數(shù)據(jù)和醫(yī)院信息系統(tǒng)數(shù)據(jù)中所發(fā)揮的作用進行了論述。
數(shù)據(jù)挖掘技術(shù)在醫(yī)學(xué)數(shù)據(jù)中應(yīng)用的目的是從大量的醫(yī)學(xué)數(shù)據(jù)中挖掘出潛在的且與致病有關(guān)的因素,并且在此過程中獲取到更多的信息、模型、關(guān)聯(lián)規(guī)則等,將這些挖掘出的成果應(yīng)用于臨床,從而能夠幫助醫(yī)生進行更快更準(zhǔn)的疾病判斷。本文的主要工作如下:
首先,本文第二章詳細(xì)闡述了醫(yī)學(xué)數(shù)據(jù)的特點以及常用的數(shù)據(jù)挖掘算法的理論基礎(chǔ),方法結(jié)構(gòu)。還介紹了各種數(shù)據(jù)挖掘模型的簡單解釋。
其次,本文主要通過一個乳腺癌相關(guān)的醫(yī)學(xué)數(shù)據(jù)集,探索了數(shù)據(jù)挖掘中的logistic回歸分析預(yù)測和隨機森林(決策樹)分類預(yù)測技術(shù)在醫(yī)學(xué)數(shù)據(jù)上的分類功能。并在分類結(jié)果上取得較好的分類精確度。之后可以作為輔助醫(yī)生的一種診斷方案,對被預(yù)測得乳腺癌概率較高的患者可以重點觀察,重點診斷。
最后,本文對兩個數(shù)據(jù)集中所得出的分類和預(yù)測結(jié)果進行解釋說明,并提出相關(guān)的對策和改進意見。并在文末提出了關(guān)于本文的不足與將來進行改進的方向。
關(guān)鍵詞:數(shù)據(jù)挖掘;回歸分析;決策樹;乳腺癌
The application of data mining technology in medical data.
Abstract in Chinese
The application of data mining has become a hot topic with the development of big data technology and Artificial Intelligence Technology, and it has been applied in a great many fields, such as financial industry, educational industry, healthcare industry and other industries. Among them, the application of healthcare industry covers precision medicine, gene engineering,gene sequencing and other frontier fields . This article fully discusses the role of model algorithm of data mining in medical clinical data and hospital information system data.
The purpose of data mining technology applied in the medical data is to dig out the potential factors that are related to the disease from a large number of medical data, and to get more information, models, association rules and so on from the process. the excavated achievements are used for clinical medicine ,which can help doctors to judge disease faster and more accurate . The main work of this article is as follows:
First of all, the second chapter ot this article elaborates the characteristics of medical data and common theoretical basis and method structure of data mining algorithms. A brief explanation of various data mining models is also introduced.
Secondly, this article mainly explores the classificatory function of the logistic regression analysis and random forest (decision tree) in data mining ,through a breast cancer related medical data sets . Moreover, the classification results acquireed better classification accuracy. It can be used as a diagnostic program to assist doctors to concentrate on observating patients with a higher probability of breast cancer.
Finally, this article makes an explaination for the classification and prediction results of two data sets, and puts forward relevant countermeasures and suggestions. At the end of the article, the author comes up with the deficiency and the direction of the future improvement.
Key words: Data mining; Regression analysis; Decision tree; Breast cancer