Artificial Intelligence in Histopathology and Cytopathology

06/29/2026
Key Takeaways
- High diagnostic accuracy was described across reviewed tasks, with controlled evaluations often reporting performance comparable to expert pathologists.
- The six included reviews covered use cases such as prostate carcinoma, lymphoma, metastatic lymph node detection, and glioma grading.
- Heterogeneity in platforms, staining, and reference standards, along with mostly retrospective evidence and limited prospective validation, were highlighted alongside discussion of standardized reporting and real-world implementation trials.
The umbrella review focused on diagnostic pathology evidence in histopathology and cytopathology and drew only from existing systematic reviews and meta-analyses. The literature search excluded non-pathological imaging, including radiology and endoscopy, to keep the scope centered on pathology-specific applications. Six systematic reviews formed the evidence base for comparisons across pathology tasks. Methodological quality was appraised with AMSTAR-2. The review paired a pathology-focused scope with formal appraisal of the included evidence syntheses.
Within the included reviews, use cases spanned metastatic lymph node detection, glioma grading, prostate carcinoma, and lymphoma-related diagnostic tasks. Across those areas, the synthesis incorporated sensitivity, specificity, and AUC when those measures were sufficiently comparable between underlying reviews and study questions. Deep learning algorithms were described as the predominant AI approach across the pathology tasks represented in the synthesis. Performance was often comparable to expert pathologists in controlled settings. Those comparisons were reported in controlled settings rather than routine clinical practice environments.
The review also described substantial heterogeneity across scanning platforms, staining variations, and reference standards, which complicated direct comparison across reviews and tasks. Much of the underlying primary evidence came from retrospective studies, limiting how confidently the results could be extended beyond development settings. The lack of multicenter prospective validation was another major gap in the available evidence base. Standardized reporting and real-world clinical implementation trials were also identified in discussion of future research.
Overall, the synthesis supported strong diagnostic performance while leaving broader real-world validation unresolved.
