We release the MIMeta Dataset, a novel meta dataset comprised of 17 publicly available datasets containing a total of 28 tasks. We additionally prepared a private set of tasks derived from different datasets which will be used for validation and final testing of the submissions. All datasets included in the MIMeta dataset have been previously published under a creative commons licence. The dataset bears similarity to, and has partial overlap with, the Medical MNIST dataset. However, we go beyond Medical MNIST in the amount and diversity of tasks included in our dataset. Moreover, all images in MIMeta are standardized to an image size of 224x224 pixels which allows a more clinically meaningful analysis of the images.
In addition to the dataset, we also release the MIMeta PyTorch Toolbox. The toolbox is a Python package that allows easy access to all tasks in the MIMeta dataset. It provides a unified interface to the data and allows for easy integration into existing PyTorch projects. The toolbox is available on GitHub.
Additionally, we also release the TorchCross Python package, which provides general pytorch functionality for cross-domain few-shot learning and can be used to reproduce some simple baselines based on cross-domain fine-tuning and on meta-learning.
Dataset ID: aml
Data source: AML Cytomorphology LMU
Number of Images: 18365
Summary: Morphological Dataset of Leukocytes containing expert-labeled single-cell images from peripheral blood smears of patients with Acute Myeloid Leukemia and patients without signs of hematological malignancy
Morphological class – Multiclass classification
There are 15 labels: BAS Basophil; EBO Erythroblast; EOS Eosinophil; KSC Smudge cell; LYA Lymphocyte (atypical); LYT Lymphocyte (typical); MMZ Metamyelocyte; MOB Monoblast; MON Monocyte; MYB Myelocyte; MYO Myeloblast; NGB Neutrophil (band); NGS Neutrophil (segmented); PMB Promyelocyte (bilobled); PMO Promyelocyte
Dataset ID: bus
Data source: Breast Ultrasound Images Dataset (BUSI)
Number of Images: 780
Summary: Dataset of breast ultrasound images of women between 25 and 75 years old. The data contains normal, benign, and malignant.
Case category – Multiclass classification
There are 3 labels: normal; benign; malignant
Malignancy – Binary classification
There are 2 labels: no malignant finding; malignant finding
Dataset ID: crc
Data source: NCT-CRC-HE-100K
Number of Images: 107180
Summary: Image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.
Tissue class – Multiclass classification
There are 9 labels: adipose (ADI); background (BACK); debris (DEB); lymphocytes (LYM); mucus (MUC); smooth muscle (MUS); normal colon mucosa (NORM); cancer-associated stroma (STR); colorectal adenocarcinoma epithelium (TUM)
Dataset ID: cxr
Data source: ChestX-ray14 (link to data)
Number of Images: 112120
Summary: Chest X-ray dataset containing 112,120 frontal-view X-ray images with annotations for 14 common thorax diseases
Disease labels – Multilabel classification
There are 14 labels: Atelectasis; Cardiomegaly; Effusion; Infiltration; Mass; Nodule; Pneumonia; Pneumothorax; Consolidation; Edema; Emphysema; Fibrosis; Pleural_Thickening; Hernia
Patient gender – Binary classification
There are 2 labels: M; F
Dataset ID: derm
Data source: HAM10000
Number of Images: 11720
Summary: Dermatoscopic images of common pigmented skin lesions from different populations acquired and stored by different modalities.
Disease category – Multiclass classification
There are 7 labels: Melanoma; Melanocytic nevus; Basal cell carcinoma; Actinic keratosis / Bowen’s disease (intraepithelial carcinoma); Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis); Dermatofibroma; Vascular lesion
Dataset ID: dr_regular
Data source: DeepDRiD
Number of Images: 2000
Summary: Dataset of fundus images with diabetic retinopathy grades and image quality annotations.
Dr level – Ordinal regression
There are 5 labels: Grade 0: No apparent retinopathy; Grade 1: Mild – NPDR; Grade 2: Moderate – NPDR; Grade 3: Severe – NPDR; Grade 4: PDR
Overall quality – Binary classification
There are 2 labels: Quality is not good enough for the diagnosis of retinal diseases; Quality is good enough for the diagnosis of retinal diseases
Artifact – Ordinal regression
There are 6 labels: Do not contain artifacts; Outside the aortic arch with range less than 1/4 of the image; Do not affect the macular area with scope less than 1/4; Cover more than 1/4, less than 1/2 of the image; Cover more than 1/2 without fully cover the posterior pole; Cover the entire posterior pole
Clarity – Ordinal regression
There are 5 labels: Only Level 1 vascular arch can be identified; Can identify Level 2 vascular arch and a small number of lesions; Can identify Level 3 vascular arch and some lesions; Can identify Level 3 vascular arch and most lesions; Can identify Level 3 vascular arch and all lesions
Field definition – Ordinal regression
There are 5 labels: Do not include the optic disc and macular; Only contain either optic disc or macula; Contain both optic disc and macula; The optic disc and macula are within 2PD of the center; The optic disc and macula are within 1PD of the center
Dataset ID: dr_uwf
Data source: DeepDRiD
Number of Images: 250
Summary: Dataset of ultra-widefield fundus images with annotations for diabetic retinopathy grading
Dr level – Ordinal regression
There are 5 labels: Grade 0: No apparent retinopathy; Grade 1: Mild – NPDR; Grade 2: Moderate – NPDR; Grade 3: Severe – NPDR; Grade 4: PDR
Dataset ID: fundus
Data source: RFMiD
Number of Images: 3200
Summary: Multi-disease Retinal Fundus Image Dataset consisting of 3200 fundus images captured using three different fundus cameras with 45 conditions annotated through adjudicated consensus of two senior retinal experts as well as an overall disease presence label.
Disease presence – Binary classification
There are 2 labels: normal; abnormal
Disease labels – Multilabel classification
There are 45 labels: DR; ARMD; MH; DN; MYA; BRVO; TSLN; ERM; LS; MS; CSR; ODC; CRVO; TV; AH; ODP; ODE; ST; AION; PT; RT; RS; CRS; EDN; RPEC; MHL; RP; CWS; CB; ODPM; PRH; MNF; HR; CRAO; TD; CME; PTCR; CF; VH; MCA; VS; BRAO; PLQ; HPED; CL
Dataset ID: glaucoma
Data source: Chaksu
Number of Images: 1345
Summary: Glaucoma-specific Indian ethnicity retinal fundus images acquired using three devices. Five expert ophthalmologists provided annotations on whether the subject is glaucoma suspect or not.
Glaucoma suspect – Binary classification
There are 2 labels: Normal; Suspect
Dataset ID: mammo_calc
Data source: CBIS-DDSM
Number of Images: 1872
Summary: Cropped regions of interest (calcifications) from a curated breast imaging dataset of screening mammographies
Pathology – Binary classification
There are 2 labels: benign; malignant
Calc type – Multilabel classification
There are 14 labels: AMORPHOUS; COARSE; DYSTROPHIC; EGGSHELL; FINE_LINEAR_BRANCHING; LARGE_RODLIKE; LUCENT_CENTER; LUCENT_CENTERED; MILK_OF_CALCIUM; PLEOMORPHIC; PUNCTATE; ROUND_AND_REGULAR; SKIN; VASCULAR
Calc distribution – Multilabel classification
There are 5 labels: CLUSTERED; DIFFUSELY_SCATTERED; LINEAR; REGIONAL; SEGMENTAL
Dataset ID: mammo_mass
Data source: CBIS-DDSM
Number of Images: 1696
Summary: Cropped regions of interest (masses) from a curated breast imaging dataset of screening mammographies
Pathology – Binary classification
There are 2 labels: benign; malignant
Mass shape – Multilabel classification
There are 8 labels: ARCHITECTURAL_DISTORTION; ASYMMETRIC_BREAST_TISSUE; FOCAL_ASYMMETRIC_DENSITY; IRREGULAR; LOBULATED; LYMPH_NODE; OVAL; ROUND
Mass margins – Multilabel classification
There are 5 labels: CIRCUMSCRIBED; ILL_DEFINED; MICROLOBULATED; OBSCURED; SPICULATED
Dataset ID: oct
Data source: Kermany OCT
Number of Images: 84484
Summary: Optical Coherence Tomography (OCT) images labeled for disease classification
Disease – Multiclass classification
There are 4 labels: CNV; DME; DRUSEN; NORMAL
Dataset ID: organs_axial
Data source: Liver Tumor Segmentation Benchmark (LiTS)
Number of Images: 1645
Summary: Axial slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset
Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head
Dataset ID: organs_coronal
Data source: Liver Tumor Segmentation Benchmark (LiTS)
Number of Images: 1645
Summary: Coronal slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset
Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head
Dataset ID: organs_sagittal
Data source: Liver Tumor Segmentation Benchmark (LiTS)
Number of Images: 1645
Summary: Sagittal slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset
Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head
Dataset ID: pbc
Data source: PBC Ddataset
Number of Images: 17092
Summary: A dataset of microscopic peripheral blood cell images of individual normal cells. The images were captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection.
Cell class – Multiclass classification
There are 8 labels: neutrophil; eosinophil; basophil; lymphocyte; monocyte; immature granulocyte; erythroblast; platelet
Dataset ID: pneumonia
Data source: Pediatric Pneumonia Dataset
Number of Images: 5856
Summary: Pediatric chest X-ray images labeled for pneumonia classification
Disease – Binary classification
There are 2 labels: NORMAL; PNEUMONIA