文獻信息
深度學(xué)習(xí)算法用于頭部CT掃描關(guān)鍵發(fā)現(xiàn)檢測:一項回顧性研究
Sasank Chilamkurthy, 印度Qure.ai公司
Lancet 2018 柳葉刀期刊論文
Impact Score: 79.321
H5-index: 0
Computer Science Conferences Ranking
影響因子
Motivation
- We aimed to develop and validate a set of deep learning algorithms for automated detection of the following key findings from these scans: intracranial haemorrhage and its types (ie, intraparenchymal, intraventricular, subdural, extradural, and subarachnoid); calvarial fractures; midline shift; and mass effect;
旨在開發(fā)和驗證一套深度學(xué)習(xí)算法,用于自動檢測這些掃描的以下關(guān)鍵發(fā)現(xiàn):顱內(nèi)出血及其5種類型(即腦實質(zhì)內(nèi)、腦室內(nèi)、硬膜下、硬膜外和蛛網(wǎng)膜下腔出血);顱骨骨折;中線移位;質(zhì)量效應(yīng)。 - Our results show that deep learning algorithms can accurately identify head CT scan abnormalities requiring urgent attention, opening up the possibility to use these algorithms to automate the triage process.
我們的結(jié)果表明,深度學(xué)習(xí)算法可以準確識別需要緊急關(guān)注的頭部 CT 掃描異常,為使用這些算法自動進行分類過程開辟了可能性。
Contribution
- We retrospectively collected a dataset containing 313,318 head CT scans together with their clinical reports from around 20 centres in India between Jan 1, 2011, and June 1, 2017. A randomly selected part of this dataset (Qure25k dataset) was used for validation and the rest was used to develop algorithms. An additional validation dataset (CQ500 dataset) was collected in two batches from centres that were different from those used for the development and Qure25k datasets. The Qure25k dataset contained 21 095 scans (mean age 43 years; 9030 [43%] female patients), and the CQ500 dataset consisted of 214 scans in the first batch (mean age 43 years; 94 [44%] female patients) and 277 scans in the second batch (mean age 52 years; 84 [30%] female patients);
作者回顧性收集了2011年1月1日至2017年6月1日期間印度約20個中心的31318例頭部CT掃描及其臨床報告。該數(shù)據(jù)集隨機選取的一部分(Qure25k數(shù)據(jù)集)用于驗證,其余的用于開發(fā)算法。另外一個驗證數(shù)據(jù)集(CQ500數(shù)據(jù)集)分兩批從不同于用于開發(fā)和Qure25k數(shù)據(jù)集的中心收集。Qure25k數(shù)據(jù)集包含21095次掃描(平均年齡43歲;9030[43%]名女性患者),CQ500數(shù)據(jù)集包含第一批214次掃描(平均年齡43歲;94[44%]名女性患者)和第二批277次掃描(平均年齡52歲;84[30%]名女性患者); - On the Qure25k dataset, the algorithms achieved an AUC of 0·92 (95% CI 0·91–0·93) for detecting intracranial haemorrhage (0·90 [0·89–0·91] for intraparenchymal, 0·96 [0·94–0·97] for intraventricular, 0·92 [0·90–0·93] for subdural, 0·93 [0·91–0·95] for extradural, and 0·90 [0·89–0·92] for subarachnoid). On the CQ500 dataset, AUC was 0·94 (0·92–0·97) for intracranial haemorrhage (0·95 [0·93–0·98], 0·93 [0·87–1·00], 0·95 [0·91–0·99], 0·97 [0·91–1·00], and 0·96 [0·92–0·99], respectively). AUCs on the Qure25k dataset were 0·92 (0·91–0·94) for calvarial fractures, 0·93 (0·91–0·94) for midline shift, and 0·86 (0·85–0·87) for mass effect, while AUCs on the CQ500 dataset were 0·96 (0·92–1·00), 0·97 (0·94–1·00), and 0·92 (0·89–0·95), respectively;
在Qure25k數(shù)據(jù)集上,算法檢測顱內(nèi)出血的AUC為0.92(95%可信區(qū)間0.91–0.93)(腦實質(zhì)內(nèi)出血為0.90[0.89–0.91],腦室內(nèi)出血為0.96[0.94–0.97],硬膜下出血為0.92[0.90–0.93],硬膜外出血為0.93[0.91–0.95],蛛網(wǎng)膜下腔出血為0.90[0.89–0.92])。在CQ500數(shù)據(jù)集上,顱內(nèi)出血的AUC為0.94(0.92-0.97)(分別為0.95[0.93-0.98]、0.93[0.87-1.00]、0.95[0.91-0.99]、0.97[0.91-1.00]和0.96[0.92-0.99])。Qure25k數(shù)據(jù)集上顱骨骨折的AUC為0.92(0.91–0.94),中線移位的AUC為0.93(0.91–0.94),質(zhì)量效應(yīng)的AUC為0.86(0.85–0.87),而CQ500數(shù)據(jù)集上的AUC分別為0.96(0.92–1.00),0.97(0.94–1.00)和0.92(0.89–0.95); - To our knowledge, our study is the first to describe the development of a system that separately identifies critical abnormalities on head CT scans.
首次描述了一個系統(tǒng)的開發(fā),該系統(tǒng)可以單獨識別頭部 CT 掃描的嚴重異常。
Approach
- First, a natural language processing (NLP) algorithm was used to detect intraparenchymal, intraventricular, subdural, extradural, and subarachnoid haemorrhages, and calvarial fractures from clinical radiology reports. Second, reports were randomly selected so that there were around 80 scans with each of intraparenchymal, subdural, extradural, and subarachnoid haemorrhages, and calvarial fractures. Each of the selected scans were then screened for the following exclusion criteria: postoperative defect; absence of non-contrast (plain) axial series covering complete brain; and patient was younger than 7 years (estimated from cranial sutures19 if data were unavailable).
首先,使用自然語言處理 (NLP) 算法從臨床放射學(xué)報告中檢測腦實質(zhì)內(nèi)、腦室內(nèi)、硬膜下、硬膜外和蛛網(wǎng)膜下腔出血以及顱骨骨折。 其次,隨機選擇報告,以便對腦實質(zhì)內(nèi)、硬膜下、硬膜外和蛛網(wǎng)膜下腔出血以及顱骨骨折進行大約 80 次掃描。 然后根據(jù)以下排除標準篩選每個選定的掃描:術(shù)后缺損; 沒有覆蓋整個大腦的非對比(普通)軸向系列; 并且患者小于 7 歲(如果數(shù)據(jù)不可用,則根據(jù)顱縫估計 19)。
Experiment
The original clinical radiology report and consensus of three independent radiologists were considered as gold standard for the Qure25k and CQ500 datasets, respectively. Areas under the receiver operating characteristic curves (AUCs) were primarily used to assess the algorithms.
三位獨立放射科醫(yī)師的原始臨床放射學(xué)報告和共識分別被視為 Qure25k 和 CQ500 數(shù)據(jù)集的金標準。 接受者操作特征曲線 (AUC) 下的面積主要用于評估算法。測試指標
For both CQ500 and Qure25k datasets, receiver operating characteristic (ROC) curves20 were obtained for each of the target findings by varying the threshold and plotting the true positive rate (ie, sensitivity) and false positive rate (ie, 1–specificity) at each threshold. Two operating points were chosen on the ROC curve so that sensitivity was approximately 0·9 (high sensitivity point) and specificity approximately 0·9 (high specificity point; see appendix p 5 for algorithm for operating point choice). Areas under the ROC curves (AUCs) and sensitivities and specificities at these two operating points were used to assess the algorithms.
對于 CQ500 和 Qure25k 數(shù)據(jù)集,通過改變閾值并繪制每個目標的真陽性率(即靈敏度)和假陽性率(即 1-特異性),獲得了每個目標結(jié)果的受試者工作特征 (ROC) 曲線 20 臨界點。 在 ROC 曲線上選擇了兩個操作點,因此靈敏度約為 0·9(高靈敏度點)和特異性約為 0·9(高特異性點;操作點選擇算法參見附錄第 5 頁)。 ROC 曲線下面積 (AUC) 以及這兩個操作點的敏感性和特異性用于評估算法。