Data


MIMeta Dataset in a Nutshell

We release the MIMeta Dataset, a novel meta dataset comprised of 17 publicly available datasets containing a total of 28 tasks. We additionally prepared a private set of tasks derived from different datasets which will be used for validation and final testing of the submissions. All datasets included in the MIMeta dataset have been previously published under a creative commons licence. The dataset bears similarity to, and has partial overlap with, the Medical MNIST dataset. However, we go beyond Medical MNIST in the amount and diversity of tasks included in our dataset. Moreover, all images in MIMeta are standardized to an image size of 224x224 pixels which allows a more clinically meaningful analysis of the images.

MIMeta PyTorch Toolbox

In addition to the dataset, we also release the MIMeta PyTorch Toolbox. The toolbox is a Python package that allows easy access to all tasks in the MIMeta dataset. It provides a unified interface to the data and allows for easy integration into existing PyTorch projects. The toolbox is available on GitHub.

Additionally, we also release the TorchCross Python package, which provides general pytorch functionality for cross-domain few-shot learning and can be used to reproduce some simple baselines based on cross-domain fine-tuning and on meta-learning.

Download the Data

Training Datasets


Dataset ID: aml

Data source: AML Cytomorphology LMU

Number of Images: 18365

Summary: Morphological Dataset of Leukocytes containing expert-labeled single-cell images from peripheral blood smears of patients with Acute Myeloid Leukemia and patients without signs of hematological malignancy

Tasks

Morphological class – Multiclass classification
There are 15 labels: BAS Basophil; EBO Erythroblast; EOS Eosinophil; KSC Smudge cell; LYA Lymphocyte (atypical); LYT Lymphocyte (typical); MMZ Metamyelocyte; MOB Monoblast; MON Monocyte; MYB Myelocyte; MYO Myeloblast; NGB Neutrophil (band); NGS Neutrophil (segmented); PMB Promyelocyte (bilobled); PMO Promyelocyte


Dataset ID: bus

Data source: Breast Ultrasound Images Dataset (BUSI)

Number of Images: 780

Summary: Dataset of breast ultrasound images of women between 25 and 75 years old. The data contains normal, benign, and malignant.

Tasks

Case category – Multiclass classification
There are 3 labels: normal; benign; malignant

Malignancy – Binary classification
There are 2 labels: no malignant finding; malignant finding


Dataset ID: crc

Data source: NCT-CRC-HE-100K

Number of Images: 107180

Summary: Image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.

Tasks

Tissue class – Multiclass classification
There are 9 labels: adipose (ADI); background (BACK); debris (DEB); lymphocytes (LYM); mucus (MUC); smooth muscle (MUS); normal colon mucosa (NORM); cancer-associated stroma (STR); colorectal adenocarcinoma epithelium (TUM)


Dataset ID: cxr

Data source: ChestX-ray14 (link to data)

Number of Images: 112120

Summary: Chest X-ray dataset containing 112,120 frontal-view X-ray images with annotations for 14 common thorax diseases

Tasks

Disease labels – Multilabel classification
There are 14 labels: Atelectasis; Cardiomegaly; Effusion; Infiltration; Mass; Nodule; Pneumonia; Pneumothorax; Consolidation; Edema; Emphysema; Fibrosis; Pleural_Thickening; Hernia

Patient gender – Binary classification
There are 2 labels: M; F


Dataset ID: derm

Data source: HAM10000

Number of Images: 11720

Summary: Dermatoscopic images of common pigmented skin lesions from different populations acquired and stored by different modalities.

Tasks

Disease category – Multiclass classification
There are 7 labels: Melanoma; Melanocytic nevus; Basal cell carcinoma; Actinic keratosis / Bowen’s disease (intraepithelial carcinoma); Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis); Dermatofibroma; Vascular lesion


Dataset ID: dr_regular

Data source: DeepDRiD

Number of Images: 2000

Summary: Dataset of fundus images with diabetic retinopathy grades and image quality annotations.

Tasks

Dr level – Ordinal regression
There are 5 labels: Grade 0: No apparent retinopathy; Grade 1: Mild – NPDR; Grade 2: Moderate – NPDR; Grade 3: Severe – NPDR; Grade 4: PDR

Overall quality – Binary classification
There are 2 labels: Quality is not good enough for the diagnosis of retinal diseases; Quality is good enough for the diagnosis of retinal diseases

Artifact – Ordinal regression
There are 6 labels: Do not contain artifacts; Outside the aortic arch with range less than 1/4 of the image; Do not affect the macular area with scope less than 1/4; Cover more than 1/4, less than 1/2 of the image; Cover more than 1/2 without fully cover the posterior pole; Cover the entire posterior pole

Clarity – Ordinal regression
There are 5 labels: Only Level 1 vascular arch can be identified; Can identify Level 2 vascular arch and a small number of lesions; Can identify Level 3 vascular arch and some lesions; Can identify Level 3 vascular arch and most lesions; Can identify Level 3 vascular arch and all lesions

Field definition – Ordinal regression
There are 5 labels: Do not include the optic disc and macular; Only contain either optic disc or macula; Contain both optic disc and macula; The optic disc and macula are within 2PD of the center; The optic disc and macula are within 1PD of the center


Dataset ID: dr_uwf

Data source: DeepDRiD

Number of Images: 250

Summary: Dataset of ultra-widefield fundus images with annotations for diabetic retinopathy grading

Tasks

Dr level – Ordinal regression
There are 5 labels: Grade 0: No apparent retinopathy; Grade 1: Mild – NPDR; Grade 2: Moderate – NPDR; Grade 3: Severe – NPDR; Grade 4: PDR


Dataset ID: fundus

Data source: RFMiD

Number of Images: 3200

Summary: Multi-disease Retinal Fundus Image Dataset consisting of 3200 fundus images captured using three different fundus cameras with 45 conditions annotated through adjudicated consensus of two senior retinal experts as well as an overall disease presence label.

Tasks

Disease presence – Binary classification
There are 2 labels: normal; abnormal

Disease labels – Multilabel classification
There are 45 labels: DR; ARMD; MH; DN; MYA; BRVO; TSLN; ERM; LS; MS; CSR; ODC; CRVO; TV; AH; ODP; ODE; ST; AION; PT; RT; RS; CRS; EDN; RPEC; MHL; RP; CWS; CB; ODPM; PRH; MNF; HR; CRAO; TD; CME; PTCR; CF; VH; MCA; VS; BRAO; PLQ; HPED; CL


Dataset ID: glaucoma

Data source: Chaksu

Number of Images: 1345

Summary: Glaucoma-specific Indian ethnicity retinal fundus images acquired using three devices. Five expert ophthalmologists provided annotations on whether the subject is glaucoma suspect or not.

Tasks

Glaucoma suspect – Binary classification
There are 2 labels: Normal; Suspect


Dataset ID: mammo_calc

Data source: CBIS-DDSM

Number of Images: 1872

Summary: Cropped regions of interest (calcifications) from a curated breast imaging dataset of screening mammographies

Tasks

Pathology – Binary classification
There are 2 labels: benign; malignant

Calc type – Multilabel classification
There are 14 labels: AMORPHOUS; COARSE; DYSTROPHIC; EGGSHELL; FINE_LINEAR_BRANCHING; LARGE_RODLIKE; LUCENT_CENTER; LUCENT_CENTERED; MILK_OF_CALCIUM; PLEOMORPHIC; PUNCTATE; ROUND_AND_REGULAR; SKIN; VASCULAR

Calc distribution – Multilabel classification
There are 5 labels: CLUSTERED; DIFFUSELY_SCATTERED; LINEAR; REGIONAL; SEGMENTAL


Dataset ID: mammo_mass

Data source: CBIS-DDSM

Number of Images: 1696

Summary: Cropped regions of interest (masses) from a curated breast imaging dataset of screening mammographies

Tasks

Pathology – Binary classification
There are 2 labels: benign; malignant

Mass shape – Multilabel classification
There are 8 labels: ARCHITECTURAL_DISTORTION; ASYMMETRIC_BREAST_TISSUE; FOCAL_ASYMMETRIC_DENSITY; IRREGULAR; LOBULATED; LYMPH_NODE; OVAL; ROUND

Mass margins – Multilabel classification
There are 5 labels: CIRCUMSCRIBED; ILL_DEFINED; MICROLOBULATED; OBSCURED; SPICULATED


Dataset ID: oct

Data source: Kermany OCT

Number of Images: 84484

Summary: Optical Coherence Tomography (OCT) images labeled for disease classification

Tasks

Disease – Multiclass classification
There are 4 labels: CNV; DME; DRUSEN; NORMAL


Dataset ID: organs_axial

Data source: Liver Tumor Segmentation Benchmark (LiTS)

Number of Images: 1645

Summary: Axial slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset

Tasks

Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head


Dataset ID: organs_coronal

Data source: Liver Tumor Segmentation Benchmark (LiTS)

Number of Images: 1645

Summary: Coronal slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset

Tasks

Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head


Dataset ID: organs_sagittal

Data source: Liver Tumor Segmentation Benchmark (LiTS)

Number of Images: 1645

Summary: Sagittal slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset

Tasks

Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head


Dataset ID: pbc

Data source: PBC Ddataset

Number of Images: 17092

Summary: A dataset of microscopic peripheral blood cell images of individual normal cells. The images were captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection.

Tasks

Cell class – Multiclass classification
There are 8 labels: neutrophil; eosinophil; basophil; lymphocyte; monocyte; immature granulocyte; erythroblast; platelet


Dataset ID: pneumonia

Data source: Pediatric Pneumonia Dataset

Number of Images: 5856

Summary: Pediatric chest X-ray images labeled for pneumonia classification

Tasks

Disease – Binary classification
There are 2 labels: NORMAL; PNEUMONIA