LITTLE BY LITTLE

[1] 캐글 RSNA 2022 Cervical Spine Fracture Detection 본문

4-2/이미지 딥러닝

[1] 캐글 RSNA 2022 Cervical Spine Fracture Detection

위나 2022. 10. 9. 15:51

RSNA 2022 Cervical Spine Fracture Detection

https://www.kaggle.com/competitions/rsna-2022-cervical-spine-fracture-detection/overview/evaluation

 

RSNA 2022 Cervical Spine Fracture Detection | Kaggle

 

www.kaggle.com


  1. 목표
    1. To Develop Machine Learning Models that match the radiologists' performance in Detecting and Localizing features to the 7 vertebrae that comprise the cervical spine. 
    2. 척추 골절 중 가장 흔한 경추(cervical spine)에 대해 다룸
    3. 특히 노인들 사이 척추 골절이 많이 발생하고 있는데, 골절은 퇴행성 질환과 골다공증 때문에 발견하는 것이 어려움
    4. 골절 진단시 x-rays (radiographs)가 아닌 CT (computed tomography)로 이루어지고 있음
    5. 트라우마 이후 neaurologic deterioration(coma와 유사한 증상) 과 마비를 방지하기 위해서 척추 골절의 위치를 파악하는 것이 중요함
  2. 평가 방법
    1. weighted multi-label logarithmic loss
    2. 예측해야하는 것들은
      1. a prob for a fracture at each of the 7 cervical vertebrae ( C1~C7 ) 
        • fractures' sub type is its own row for every exam
      2. overall prob of any fractures in the cervical spine
        1. 이를 예측하기 위한 any label이 존재 'patient_overall' : indicates that fracture of ANY kind described before exists in the examination. Fractures in the skull base, thoracic spine, ribs, and clavicles are ignored. => The any label is weighted more highly than specific fracture level sub-types( C1~C7 ).)
    3.  must submit a set of predicted probabilities (a separate row for each cervical level subtype). We then take the log loss for each predicted probability versus its true label.

목 뼈(cervical vertebrae)는 7가지 뼈가 구성하고 있다 ( C1~ C7 )


평가 방법 - WEIGHTED log loss


선행 연구

CNN을 사용한 경추 골절 detection에 대한 평가

: Results part

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324280/

 

CT Cervical Spine Fracture Detection Using a Convolutional Neural Network

Multidetector CT has emerged as the standard of care imaging technique to evaluate cervical spine trauma. Our aim was to evaluate the performance of a convolutional neural network in the detection of cervical spine fractures on CT.We evaluated C-spine, ...

www.ncbi.nlm.nih.gov


Abstract

Aim :

x-ray와의 비교를 통한 경추 골절 CT에 쓰이는 CNN의 성능 평가

to evaluate the performance of a convolutional neural network (CNN) developed by Aidoc (www.aidoc.com) for the detection of cervical spine fractures on CT.

  1. CNN 성능을 X-ray 성능과 비교 We establish the presence of fractures based on retrospective clinical diagnosis and compare the CNN performance with that of radiologists.
  2. Aidoc’s CNN currently runs continuously on our hospital system and functions as a triage and notification software for analysis and detection of cervical spine fractures.
  3. However, we purposefully conducted a retrospective study on cervical spine studies performed before system-wide deployment, as we wanted to compare CNN performance to radiologist performance without the aid of the tool. A proficient algorithm may help identify and triage(중증도 분류) studies for the radiologist to review more urgently, helping to ensure faster diagnoses.

MATERIALS AND METHODS:

  1. We evaluated C-spine, an FDA-approved convolutional neural network developed by Aidoc to detect cervical spine fractures on CT.
  2. 665 examinations are included.
  3. The ĸ coefficients, sensitivity, specificity, and positive and negative predictive values were calculated with 95% CIs 

RESULTS:

  1. Convolutional neural network accuracy in cervical spine fracture detection was 92% (95% CI, 90%–94%), with 76%  sensitivity and 97% specificity.
  2. The radiologist accuracy was 95% (95% CI, 94%–97%), with 93% sensitivity and 96% specificity.
  3. Fractures missed by the convolutional neural network and by radiologists were similar by level and location and included fractured anterior osteophytes, transverse processes, and spinous processes, as well as lower cervical spine fractures that are often obscured by CT beam attenuation.

CONCLUSIONS:

  1. The convolutional neural network holds promise at both worklist prioritization and assisting radiologists in cervical spine fracture detection on CT.
  2. Understanding the strengths and weaknesses of the convolutional neural network is essential before its successful incorporation into clinical practice.
  3. Further refinements in sensitivity will improve convolutional neural network diagnostic utility.
  • 선행 연구가 많이 없음 (필요성 제시1)
    • To our knowledge, no studies evaluating AI in detecting cervical spine fractures on CT have been published.
  • 경추 골절 많이 일어남 (필요성 제시2)
    • Cervical spine injury is common with greater than 3 million patients per year being evaluated for cervical spine injury in North America, and greater than 1 million patients with blunt trauma with suspected cervical spine injury per year being evaluated in the United States.
  • 경추 골절 발견의 필요성 (필요성 제시3)
    • Cervical spine injury can be associated with high morbidity and mortality,and a delay in diagnosis of an unstable fracture leading to inadequate immobilization may result in a catastrophic decline in neurologic function with devastating consequences.
  • Clearing the cervical spine through imaging is therefore a critical first step in the evaluation of patients with trauma, and multidetector CT has emerged as the standard of care imaging technique to evaluate cervical spine trauma.
  • Morbidity and mortality in patients with cervical spine injury can be reduced through rapid diagnosis and intervention.

Part 1

  1. 869개의 경추 CT 진찰이 가장 먼저 발견됨
  2. 환자들의 나이는 16세-98세, 평균 60.28세, 중위값은 61세
  3. 전체 환자 중 54.5%(379명)는 남자, 45.5%(316명)는 여자
  4. 12개의 데이터가 duplicates
  5. Excluded datasets
    1. 162개의 진찰이 CNN을 통과하지 못하였다.
    2. 그리고 162개 중 157개는 PACS(의료영상저장전송시스템)으로부터 가져오지 못함
      1. 그 이유는 idenfiable DICOM header 이 없는 외부 병원으로부터 수집되었기 때문
    3. 나머지 5개는 CNN으로 분석할 수 없었다. (because of the technical issues with the datasets)
      1. 여기서의 technical issues는 CNN orchestrator의 전처리 과정과 관련이 있다.
      2. 여기에는 일정하지 않은 DICOM tags와 missing slices 등이 포함됨
      3. 제외된 데이터와 포함된 데이터 모두 fracture prevalance는 유사하였다.
        1. 예를 들면, 제외된 데이터162 개 중 35개는 양성(22%) 이었고, 포함된 데이터 695개 중 143개는 양성(21%)으로, 양성인 정도가 유사함
        2. 위는 CNN을 실행시키기 이전에 얻은 데이터이기 때문에, CNN을 실행시킨 후 얻은 데이터 중 제외된 데이터의 비율은 더 적다.
          1. availability of technical support from the CNN Developer and the presence of reliable DICOM tags 덕분
    4. 결과적으로, CNN의 정확도는 본 연구에서의 정확도와 비교할만하다고 판단됨
      1. 포함된 695개의 남아있는 데이터 중 30개는 경추(C1-C7)가 아닌 외부 골절을 포함하고 있었기 때문에, 본 연구에서 제외되었다.
      2. 따라서 최종 샘플 사이즈는 665개
      3. 그 중 143개가 양성, 522개가 음성으로 labeled

Part 2 : x-ray 성능 result (radioogists) / CNN 성능 result

  1. true-positive(골절이 있다고 언급되고, 실제로도 양성) : 133개 / 109개
  2. true-negative(골절이 없다고 언급되고, 실제로도 음성) : 502개 / 505개
  3. false-positive(골절이 있다고 언급되었으나, MR imaging과 CNN 결과상에서는 음성) : 20개 / 17개
  4. false-negative(골절이 없다고 언급되었으나, MR imaging과 CNN 결과상에서는 양성 & could be visualized in retrospect on the cervical spine CT) : 10개 / 34개
  5. 따라서 PPV = 87% (95%CI 81-92%) / 동일
  6. NPV = 98% (CI 97%-99%) / 94% (CI 91%-96%)
  7. Sensitivity = 93% (CI, 88-97%) / 76% (CI 58%-83%)
  8. Specificity - 96% (CI, 94-98%) / 97% (CI 95%-98%)
  9. Percent Agreement = 95.5% (CI, 94-97%) / 92% (CI 90%-95%)
  10. ĸ coefficient = 0.87 (CI, 0.82-0.92) / 0.76 (CI 0.70-0.82)
  11. 걸린 시간 The time from acquisition until a finalized report for the radiologist ranged from 33-43 min / 3-8 min

Part 3 : 

  1. radiologist가 놓친 true-positive 중 7개는 CNN이 발견하였다.
  2. CNN이 발견한 true-positive 중 4개는 chronic

CNN과 radiologist의 성능을 비교한 결과를 요약
radiologists와 CNN 결과의 false-negative와 false-positive 개수를 비교한 그림


 

Comments