We release the MIMeta Dataset, a novel meta dataset comprised of 17 publicly available datasets containing a total of 28 tasks. We additionally prepared a private set of tasks derived from different datasets which will be used for validation and final testing of the submissions. All datasets included in the MIMeta dataset have been previously published under a creative commons licence. The dataset bears similarity to, and has partial overlap with, the Medical MNIST dataset. However, we go beyond Medical MNIST in the amount and diversity of tasks included in our dataset. Moreover, all images in MIMeta are standardized to an image size of 224x224 pixels which allows a more clinically meaningful analysis of the images.
In addition to the dataset, we also release the MIMeta PyTorch Toolbox. The toolbox is a Python package that allows easy access to all tasks in the MIMeta dataset. It provides a unified interface to the data and allows for easy integration into existing PyTorch projects. The toolbox is available on GitHub.
Additionally, we also release the TorchCross Python package, which provides general pytorch functionality for cross-domain few-shot learning and can be used to reproduce some simple baselines based on cross-domain fine-tuning and on meta-learning.
Dataset ID: aml
Number of Images: 18365
Summary: Morphological Dataset of Leukocytes containing expert-labeled single-cell images from peripheral blood smears of patients with Acute Myeloid Leukemia and patients without signs of hematological malignancy
Morphological class – Multiclass classification
There are 15 labels: BAS Basophil; EBO Erythroblast; EOS Eosinophil; KSC Smudge cell; LYA Lymphocyte (atypical); LYT Lymphocyte (typical); MMZ Metamyelocyte; MOB Monoblast; MON Monocyte; MYB Myelocyte; MYO Myeloblast; NGB Neutrophil (band); NGS Neutrophil (segmented); PMB Promyelocyte (bilobled); PMO Promyelocyte
Dataset ID: bus
Number of Images: 780
Summary: Dataset of breast ultrasound images of women between 25 and 75 years old. The data contains normal, benign, and malignant.
Case category – Multiclass classification
There are 3 labels: normal; benign; malignant
Malignancy – Binary classification
There are 2 labels: no malignant finding; malignant finding
Dataset ID: crc
Number of Images: 107180
Summary: Image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.
Tissue class – Multiclass classification
There are 9 labels: adipose (ADI); background (BACK); debris (DEB); lymphocytes (LYM); mucus (MUC); smooth muscle (MUS); normal colon mucosa (NORM); cancer-associated stroma (STR); colorectal adenocarcinoma epithelium (TUM)
Dataset ID: cxr
Number of Images: 112120
Summary: Chest X-ray dataset containing 112,120 frontal-view X-ray images with annotations for 14 common thorax diseases
Disease labels – Multilabel classification
There are 14 labels: Atelectasis; Cardiomegaly; Effusion; Infiltration; Mass; Nodule; Pneumonia; Pneumothorax; Consolidation; Edema; Emphysema; Fibrosis; Pleural_Thickening; Hernia
Patient sex – Binary classification
There are 2 labels: F; M
Dataset ID: derm
Number of Images: 11720
Summary: Dermatoscopic images of common pigmented skin lesions from different populations acquired and stored by different modalities.
Disease category – Multiclass classification
There are 7 labels: Melanoma; Melanocytic nevus; Basal cell carcinoma; Actinic keratosis / Bowen’s disease (intraepithelial carcinoma); Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis); Dermatofibroma; Vascular lesion
Dataset ID: dr_regular
Number of Images: 2000
Summary: Dataset of fundus images with diabetic retinopathy grades and image quality annotations.
Dr level – Ordinal regression
There are 5 labels: Grade 0: No apparent retinopathy; Grade 1: Mild – NPDR; Grade 2: Moderate – NPDR; Grade 3: Severe – NPDR; Grade 4: PDR
Overall quality – Binary classification
There are 2 labels: Quality is not good enough for the diagnosis of retinal diseases; Quality is good enough for the diagnosis of retinal diseases
Artifact – Ordinal regression
There are 6 labels: Do not contain artifacts; Outside the aortic arch with range less than 1/4 of the image; Do not affect the macular area with scope less than 1/4; Cover more than 1/4, less than 1/2 of the image; Cover more than 1/2 without fully cover the posterior pole; Cover the entire posterior pole
Clarity – Ordinal regression
There are 5 labels: Only Level 1 vascular arch can be identified; Can identify Level 2 vascular arch and a small number of lesions; Can identify Level 3 vascular arch and some lesions; Can identify Level 3 vascular arch and most lesions; Can identify Level 3 vascular arch and all lesions
Field definition – Ordinal regression
There are 5 labels: Do not include the optic disc and macular; Only contain either optic disc or macula; Contain both optic disc and macula; The optic disc and macula are within 2PD of the center; The optic disc and macula are within 1PD of the center
Dataset ID: dr_uwf
Number of Images: 250
Summary: Dataset of ultra-widefield fundus images with annotations for diabetic retinopathy grading
Dr level – Ordinal regression
There are 5 labels: Grade 0: No apparent retinopathy; Grade 1: Mild – NPDR; Grade 2: Moderate – NPDR; Grade 3: Severe – NPDR; Grade 4: PDR
Dataset ID: fundus
Number of Images: 3200
Summary: Multi-disease Retinal Fundus Image Dataset consisting of 3200 fundus images captured using three different fundus cameras with 45 conditions annotated through adjudicated consensus of two senior retinal experts as well as an overall disease presence label.
Disease presence – Binary classification
There are 2 labels: normal; abnormal
Disease labels – Multilabel classification
There are 45 labels: DR; ARMD; MH; DN; MYA; BRVO; TSLN; ERM; LS; MS; CSR; ODC; CRVO; TV; AH; ODP; ODE; ST; AION; PT; RT; RS; CRS; EDN; RPEC; MHL; RP; CWS; CB; ODPM; PRH; MNF; HR; CRAO; TD; CME; PTCR; CF; VH; MCA; VS; BRAO; PLQ; HPED; CL
Dataset ID: glaucoma
Number of Images: 1345
Summary: Glaucoma-specific Indian ethnicity retinal fundus images acquired using three devices. Five expert ophthalmologists provided annotations on whether the subject is glaucoma suspect or not.
Glaucoma suspect – Binary classification
There are 2 labels: Normal; Suspect
Dataset ID: mammo_calc
Number of Images: 1872
Summary: Cropped regions of interest (calcifications) from a curated breast imaging dataset of screening mammographies
Pathology – Binary classification
There are 2 labels: benign; malignant
Calc type – Multilabel classification
There are 14 labels: AMORPHOUS; COARSE; DYSTROPHIC; EGGSHELL; FINE_LINEAR_BRANCHING; LARGE_RODLIKE; LUCENT_CENTER; LUCENT_CENTERED; MILK_OF_CALCIUM; PLEOMORPHIC; PUNCTATE; ROUND_AND_REGULAR; SKIN; VASCULAR
Calc distribution – Multilabel classification
There are 5 labels: CLUSTERED; DIFFUSELY_SCATTERED; LINEAR; REGIONAL; SEGMENTAL
Dataset ID: mammo_mass
Number of Images: 1696
Summary: Cropped regions of interest (masses) from a curated breast imaging dataset of screening mammographies
Pathology – Binary classification
There are 2 labels: benign; malignant
Mass shape – Multilabel classification
There are 8 labels: ARCHITECTURAL_DISTORTION; ASYMMETRIC_BREAST_TISSUE; FOCAL_ASYMMETRIC_DENSITY; IRREGULAR; LOBULATED; LYMPH_NODE; OVAL; ROUND
Mass margins – Multilabel classification
There are 5 labels: CIRCUMSCRIBED; ILL_DEFINED; MICROLOBULATED; OBSCURED; SPICULATED
Dataset ID: oct
Number of Images: 109309
Summary: Optical Coherence Tomography (OCT) images labeled for disease classification
Disease class – Multiclass classification
There are 4 labels: CNV; DME; DRUSEN; NORMAL
Urgent referral – Binary classification
There are 2 labels: NO; YES
Dataset ID: organs_axial
Number of Images: 1645
Summary: Axial slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset
Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head
Dataset ID: organs_coronal
Number of Images: 1645
Summary: Coronal slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset
Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head
Dataset ID: organs_sagittal
Number of Images: 1645
Summary: Sagittal slices of 11 different organs extracted from the Liver Tumor Segmentation Benchmark (LiTS) dataset
Organ label – Multiclass classification
There are 11 labels: heart; left lung; right lung; liver; spleen; pancreas; left kidney; right kidney; bladder; left femoral head; right femoral head
Dataset ID: pbc
Number of Images: 17092
Summary: A dataset of microscopic peripheral blood cell images of individual normal cells. The images were captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection.
Cell class – Multiclass classification
There are 8 labels: neutrophil; eosinophil; basophil; lymphocyte; monocyte; immature granulocyte; erythroblast; platelet
Dataset ID: pneumonia
Number of Images: 5856
Summary: Pediatric chest X-ray images labeled for pneumonia classification
Pneumonia presence – Binary classification
There are 2 labels: NORMAL; PNEUMONIA
Disease class – Multiclass classification
There are 3 labels: NORMAL; BACTERIA; VIRUS
Dataset ID: skinl_clinic
Number of Images: 1011
Summary: A dataset containing clinical color images of skin lesions, along with corresponding labels for seven different evaluation criteria and the diagnosis. Tasks containing infrequent labels have versions which group the infrequent labels together into more frequent labels.
Diagnosis – Multiclass classification
There are 15 labels: basal cell carcinoma; blue nevus; clark nevus; combined nevus; congenital nevus; dermal nevus; dermatofibroma; lentigo; melanoma; melanosis; miscellaneous; recurrent nevus; reed or spitz nevus; seborrheic keratosis; vascular lesion
Diagnosis grouped – Multiclass classification
There are 5 labels: basal cell carcinoma; nevus; melanoma; miscellaneous; seborrheic keratosis
Pigment network – Multiclass classification
There are 3 labels: absent; typical; atypical
Blue whitish veil – Binary classification
There are 2 labels: absent; present
Vascular structures – Multiclass classification
There are 8 labels: absent; arborizing; comma; hairpin; within regression; wreath; dotted; linear irregular
Vascular structures grouped – Multiclass classification
There are 3 labels: absent; regular; irregular
Pigmentation – Multiclass classification
There are 5 labels: absent; diffuse regular; localized regular; diffuse irregular; localized irregular
Pigmentation grouped – Multiclass classification
There are 3 labels: absent; regular; irregular
Streaks – Multiclass classification
There are 3 labels: absent; regular; irregular
Dots and globules – Multiclass classification
There are 3 labels: absent; regular; irregular
Regression structures – Multiclass classification
There are 4 labels: absent; blue areas; white areas; combinations
Regression structures grouped – Binary classification
There are 2 labels: absent; present
Dataset ID: skinl_derm
Number of Images: 1011
Summary: A dataset containing dermoscopy color images of skin lesions, along with corresponding labels for seven different evaluation criteria and the diagnosis. Tasks containing infrequent labels have versions which group the infrequent labels together into more frequent labels.
Diagnosis – Multiclass classification
There are 15 labels: basal cell carcinoma; blue nevus; clark nevus; combined nevus; congenital nevus; dermal nevus; dermatofibroma; lentigo; melanoma; melanosis; miscellaneous; recurrent nevus; reed or spitz nevus; seborrheic keratosis; vascular lesion
Diagnosis grouped – Multiclass classification
There are 5 labels: basal cell carcinoma; nevus; melanoma; miscellaneous; seborrheic keratosis
Pigment network – Multiclass classification
There are 3 labels: absent; typical; atypical
Blue whitish veil – Binary classification
There are 2 labels: absent; present
Vascular structures – Multiclass classification
There are 8 labels: absent; arborizing; comma; hairpin; within regression; wreath; dotted; linear irregular
Vascular structures grouped – Multiclass classification
There are 3 labels: absent; regular; irregular
Pigmentation – Multiclass classification
There are 5 labels: absent; diffuse regular; localized regular; diffuse irregular; localized irregular
Pigmentation grouped – Multiclass classification
There are 3 labels: absent; regular; irregular
Streaks – Multiclass classification
There are 3 labels: absent; regular; irregular
Dots and globules – Multiclass classification
There are 3 labels: absent; regular; irregular
Regression structures – Multiclass classification
There are 4 labels: absent; blue areas; white areas; combinations
Regression structures grouped – Binary classification
There are 2 labels: absent; present