Skip to content
View GeorgeBatch's full-sized avatar

Highlights

  • Pro

Block or report GeorgeBatch

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
GeorgeBatch/README.md

Hi there πŸ‘‹

I am a PhD Student in Health Data Science at Oxford supervised by Professor Jens Rittscher and funded by Professor Fergus Gleeson. I am focusing on applications of Computer Vision πŸ‘€πŸ’» to improving diagnostics and treatment of patients with lung cancer as part of the DART lung health project ( see my role in the project).

July 2024: my second workshop paper (pre-print πŸ“, code πŸ’») got accepted to DEMI-2024 workshop of MICCAI conference!πŸš€ In our work "Evaluating histopathology foundation models for few-shot tissue clustering: an application to LC25000 augmented dataset cleaning", we (1) create a pipeline for grouping augmented images using foundation models, (2) release the decontaminated version of LC25000 histopathology dataset, and (3) propose a minimal setup benchmark for evaluating pathology foundation models. Cleaned dataset, annotation framework, and evaluation pipeline are available in the LC25000-clean repository.

February 2024: my first main conference paper πŸ“ (pre-print πŸ“, code πŸ’») got accepted to ISBI-2024 conference!πŸš€ In our work "Accurate Subtyping of Lung Cancers by Modelling Class Dependencies", we (1) construct a weakly-supervised multi-label lung cancer histology dataset from three public (TCGA, TCIA-CPTAC, DHMC), and one in-house dataset DART, (2) propose a class-dependency injection method allowing the learning of robust bag representations suitable for multi-label problems under weakly-supervised settings. Dataset creation, model building, and training code is available in the dependency-mil repository.

September 2022: my first workshop paper πŸ“ (pre-print πŸ“, code πŸ’») got published at MICCAI 2022 CaPTion workshop! πŸš€ In our work "Active Data Enrichment by Learning What to Annotate in Digital Pathology", we (1) proposed a new comprehensive annotation protocol for lung cancer pathology, (2) proposed a new metric for comparing how well a retrieval methods can prioritize examples from underrepresented classes, and (3) demonstrated that annotating and adding top-runked examples into the training set results in greater improvements to the algorithm performance than annotating and adding random examples. Links: published paper, open-access paper, code.

December 2020: my first mini-conference working notes paper πŸ“ (code πŸ’») got published at MediaEval 2020 Multimedia Benchmark workshop πŸš€. In our work "Real-Time Polyp Segmentation Using U-Net with IoU Loss" we explored how using a combination of differentiable IoU and BCE losses affects the segmentation performance measured by meanIoU and DiceScore when training a simple U-Net. Links: published open-access paper, code.


Public histology data sources. If you also want to start working with histopathology images, but do not have or are waiting for your own data, consider starting with "Dartmouth Lung Cancer Histology Dataset" DHMC, the "The Cancer Genome Atlas" (TCGA), and "The Cancer Imaging Archive" TCIA-CPTAC. Downloading large volumes of data is not a trivial task, so I documented my process for TCGA-lung-histology-download, TCIA-CPTAC-lung-histology-download.

Public natural images sources. Another thing you can do if you are lacking medical data is to simulate parts of your future workflow on natural images, e.g. classifying medical images for presence or absence of particular patterns can be similar to classifying natural images for presence or absence of particular objects. I used images from the COCO dataset. You can see my work here: GeorgeBatch/cocoapi.


Education


Here are some of the best free online resources to boost your ML/DL knowledge πŸš€ I am currently doing it, while skipping the repetitive parts ⏰


Pinned Loading

  1. dependency-mil dependency-mil Public

    [ISBI 2024] Accurate Subtyping of Lung Cancers by Modelling Class Dependencies

    Jupyter Notebook 1

  2. moleculenet moleculenet Public

    MSc Dissertation: Estimating Uncertainty in Machine Learning Models for Drug Discovery

    Jupyter Notebook 3 2

  3. kvasir-seg kvasir-seg Public

    [MediaEval Medico Challenge'2020]: Real-time polyp segmentation using U-Net with IoU loss

    Jupyter Notebook 15 3

  4. simpsons-classification simpsons-classification Public

    Image Classification Coursework: Classifying Simpsons Over the Past 25 Years

    Jupyter Notebook 4 2

  5. ultrasound-nerve-segmentation ultrasound-nerve-segmentation Public

    Undergraduate Research Project: "Ultrasound Nerve Segmentation" Kaggle Competition (2016)

    Python 5 1

  6. avaspataru/datathon avaspataru/datathon Public

    Citadel DataOpen datathon: Analysis of the 2012 London Olympic Games

    Jupyter Notebook 2 2