Pathology Datasets for AI
Pathology Datasets for AI
Abstract:
Histopathology plays a crucial role in the diagnosis of many diseases, especially cancers, for which the correct classification of tissue samples significantly influences treatments for patients. The growing use of artificial intelligence (AI) in digital pathology offers opportunities for improving diagnostic process speed, accuracy, and scalability. The availability of well-structured, annotated histopathology datasets is essential to this advancement. This study provides an extensive overview of publicly available datasets tailored specifically for histopathology-related AI and machine learning research. Our review yielded 151 datasets across tissue types, cancers (gastrointestinal, brain glioma, lung adenocarcinoma, and others). We also categorize the datasets in terms of the number of patients, organs, staining, magnification, scanner, size, collected method, year, resolution. We analyze multiple key and popular datasets, including but not limited to CAMELYON, TUPAC, MIDOG, MoNuSeg, and BreakHis. We believe this review will help computational histopathology research by providing a comprehensive understanding of the available datasets, their structures, and their specific applications. Researchers can more effectively choose relevant datasets for creating AI models suited to certain tasks, such cancer diagnosis, treatment response prediction, and tissue classification, by documenting and evaluating these resources. Within the field, standardizing images across these datasets can facilitate collaborations between the data generating experts, pathologists and AI models developers, as well as help in reproducibility, and benchmark testing and evaluation. Furthermore, by combining histopathological, radiological, and genomic data, for example, this evaluation will help identify gaps in the availability of current datasets. Another benefit is that it will help identify the need for additional diversified datasets that incorporate multimodal data. Closing these gaps will be essential to creating AI models that are applicable to different types of institutions and patient groups.
Links of Datasets Covered - see https://github.com/prasathlab/PathAIData
References:
A. R. Chinnachinnanagari, S. S. Debsarkar, V. B. S. Prasath. Pathology public datasets for artificial intelligence: A systematic review. Journal of Imaging Informatics in Medicine, 2026. doi:TBA [Github]
Back to Histopathology Projects Main Page. Back to Research Page.