Cheers AI Demo for Diabetic Retinopathy and Glaucoma Detection

Features

Efficient Prediction Models

  • Efficient models trained on Inception-v3, with weightage on recall.
  • Powerful hospital-MIS to create and track patient, and patient’s historical predictions.
  • Inputs reviewed by opthalmologists and added to training.

Diabetic Retinopathy

Diabetic Retinopathy is an eye illness caused by diabetes that may lead to vision impairment and even to blindness if it isn’t identified and treated early. Of the estimated 422 million diabetics globally, more than 148 million have DR and 48 million have Vision Threating DR (VTDR).

However, because of insufficient specialists and eye care health workers globally as well as locally to screen everyone at risk, the situation seems acute especially in developing countries like Nepal. Besides, Nepal has difficult geographical terrain and people living in remorse remote areas with limited or no access to clinics and screening facilities making the condition even worse.

Glaucoma

Glaucoma is a diverse group of disorders representing the second prominent cause of blindness. It has already affected 91 million individuals all over the world. It has multiple risk factors such as older age, elevated intraocular pressure (IOP), and thinner central corneal thickness etc. However, one or more of these risk factors may or may not develop glaucoma making it difficult for accurate prediction of the disease. Additionally, since glaucoma can be asymptomatic, its detection before significant vision loss is critical. Hence, automated methods for predicting glaucoma could have a significant impact.

An intuitive app

Easy to use, access managed platform, with the primary focus on providing assistance to our opthalmologists.

Steps involved in Research, Model Creation, and Deployment

Glaucoma Prediction

What worked? (90% accuracy)

  1. densenet sequential with ben on himanchu dataset, using NLLLoss criterion, Adam optimizer

Limitation

  1. very much dependent on dataset
  2. disk extraction is good but is very subjective to the dataset
  3. trained on very small dataset

Preliminary

  • understand the difference between possibility of glaucoma by classification (vs measurements)

Preprocessing

  • ben transformation
  • extract disk from fundus images
  • improve extraction algorithms
  • perform EDA on disk image to find troubling images (cases where crop does not work)
  • convert python function to extract disk to torch transform class (failed)
  • transformation to disk during training failed. create a disk dataset before training the model.
  • train on new dataset with and without ben transformation
  • handle imbalanced class with class weighting
  • convert Kaggle dataset to the format that we have templated our notebooks with
  • for kaggle dataset get disks using new algorithm

Obseverations in regards to disk generation

  • extraction of disk does not help (too many vague areas left unfilled)
  • however, cropping shows very good promise
  • but, cropping requires somewhat similar of fundus images

Datasets

Training

  • inception v3 with and without ben on ocular, kaggle, and himanchu dataset
  • inception v3 with ben on ocular, kaggle, and himanchu dataset (disk extracted, normal, and cropped dataset)
  • densenet linear with ben on ocular, kaggle, and himanchu dataset
  • densenet linear with ben on ocular, kaggle, and himanchu dataset (disk extracted, normal, and cropped dataset)
  • densenet sequential with ben on ocular, kaggle, and himanchu dataset
  • densenet sequential with ben on ocular, kaggle, and himanchu dataset (disk extracted, normal, and cropped dataset)
  • add datasets from cheers for testing
  • add datasets from cheers for training

Diabetic Retinopathy Prediction

What worked? (90% accuracy)

  1. Large dataset from EyePACS (Kaggle competition used training (30%) and testing data (70%) from Kaggle. After the competition, the labels were published). Flipped the ratios for our use case.
  2. Remove out of focus images
  3. Remove too bright, and too dark images.
  4. Link to clean dataset https://www.kaggle.com/ayushsubedi/drunstratified
  5. To handle class imbalanced issue, used weighted random samplers. Undersampling to match no of images in the least class (4) did not work. Pickled weights for future use.
  6. Ben Graham transformation and augmentations
  7. Inception v3 fine tuning, with aux logits trained (better results compared to other architecture)
  8. Perform EDA on inference to observe what images were causing issues
  9. Removed the images and created another dataset (Link to the new dataset https://www.kaggle.com/ayushsubedi/cleannonstratifieddiabeticretinopathy
  10. See 5, 6, and 7

Datasets

Binary Stratified (cleaned): https://drive.google.com/drive/folders/12-60Gm7c_TMu1rhnMhSZjrkSqqAuSsQf?usp=sharing Categorical Stratified (cleaned): https://drive.google.com/drive/folders/1-A_Mx9GdeUwCd03TUxUS3vwcutQHFFSM?usp=sharing Non Stratified (cleaned): https://www.kaggle.com/ayushsubedi/drunstratified Recleaned Non Stratified: https://www.kaggle.com/ayushsubedi/cleannonstratifieddiabeticretinopathy

Priliminary

  • https://www.youtube.com/watch?v=VIrkurR446s&ab_channel=khanacademymedicine What is diabetic retinopathy?
  • collect all previous analysis notebooks
  • conduct preliminary EDA (for balanced dataset, missing images etc)
  • create balanced test train split for DR (stratify)
  • store the dataset in drive for colab
  • identify a few research papers, create a file to store subsequently found research papers
  • identify right technology stack to use (for ML, training, PM, model versioning, stage deployment, actual deployment)
  • perform basic augmentation
  • create a version 0 base model
  • apply a random transfer learning model
  • create a metric for evaluation
  • store the model in zenodo, or find something for version control
  • create a model that takes image as an input
  • create a streamlit app that reads model
  • streamlit app to upload and test prediction
  • test deployment to free tier heroku
  • identify gaps
  • create priliminary test set
  • create folder structures for saved model in the drive
  • figure out a way to move files from kaggle to drive (without download/upload)
  • research saving model (the frugal way)
  • research saving model to google drive after each epoch so that during unforseen interuptions, the training of the model can be continued

Resource

  • upgrade to 25GB RAM in Google Colab possibly w/ Tesla P100 GPU
  • upgrade to Colab Pro

Baseline

  • medicmind grading (accuracy: 0.8)
  • medicmind classification (0.47)

Transfer Learning

  • resnet
  • alexnet
  • vgg
  • squeezenet
  • densenet
  • inception
  • efficient net

Dataset clean images

  • create a backup of primary dataset (zip so that kaggle kernels can consume them too)
  • find algorithms to detect black/out of focus images
  • identify correct threshold for dark and out of focus images
  • remove black images
  • remove out of focus images
  • create a stratified dataset with 2015 data only (convert train and test both to train and use), remove black images and out of focus images (also create test set)
  • create non-stratified dataset with 2015 clean data only (train, test, valid) (upload in kaggle if google drive full)
  • create a binary dataset (train, test, valid)
  • create confusion matrices (train, test, valid) after clean up (dark and blurry)
  • the model is confusing labels 0 and 1 as 2, is this due to disturbance in image in 0.
  • concluded that the result is due to the model not capturing class 0 enough (due to undersampling)

Inference

  • create a csv with preds probability and real label
  • calculate recall, precision, accuracy, confusion matrix
  • identify different prediction issues
  • relationship between difference in preds and accuracy
  • inference issue: labels 0 being predicted as 4
  • inference issue: Check images from Grade 2, 3 being predicted as Grade 0
  • inference issue: Check images from Grade 4 being predicted as Grade 0
  • inference issue: Check images from Grade 0 being predicted as Grade 4
  • inference issue: A significant Grade 2 is being predicted as Grade 0
  • inference issue: More than 50% of Grade 1 is being predicted as Grade 0
  • create a new dataset

Model Improvement

  • research kaggle winning augmentation for DR
  • research appropriate augmentation: optical distortion, grid distortion, piecewise affine transform, horizontal flip, vertical flip, random rotation, random shift, random scale, a shift of RGB values, random brightness and contrast, additive Gaussian noise, blur, sharpening, embossing, random gamma, and cutout
  • train on various pretrained models or research which is supposed to be ideal for this case https://pytorch.org/vision/stable/models.html
  • create several neural nets (test different layers)
  • experiment with batch size
  • Reducing lighting-condition effects
  • Cropping uninformative area
  • Create custom dataloader based on ben graham kaggle winning strategy
  • finetune vs feature extract
  • oversample
  • undersample
  • add specificity and sensitivity to indicators
  • create train loss and valid loss charts
  • test regression models (treat this as a grading problem)
  • pickle weights

Additional Models

  • check if left/right eye classification model is required

Additional datasets

  • make datasets more extensive (add test dataset with recoverd labels to train dataset 2015)
  • add APTOS dataset
  • request labelled datasets from cheers
  • add datasets from cheers for testing
  • add datasets from cheers for training

Test datasets

  • find datasets for testing (dataset apart from APTOS and EyePACS)
  • update folder structures to match our use case
  • find dataset for testing after making sure old test datasets are not in vaid/train (4 will be empty)

Concepts/Research Papers