Redefining Risk in Endometrial Cancer: Recurrence Prediction with Machine Learning
Endometrial cancer is the most common gynecologic malignancy in the United States, with mortality projected to increase by as much as 55 percent by 2030. Despite advances in adjuvant therapies, recurrence occurs in about 15 to 20 percent of patients.
A new study by Gonzalez Bosquet and colleagues published in JCO Precision Oncology introduces a multi-layered machine learning approach that integrates clinical, pathological, and genomic data to predict recurrence risk and compares it against traditional models.
The Limits of Conventional Stratification
Historically, clinicians have relied on clinicopathologic variables like grade and stage—such as the International Federation of Gynecology and Obstetrics (FIGO) classification—to estimate endometrial cancer recurrence risk. The Cancer Genome Atlas (TCGA) introduced molecular subtypes, including POLE ultramutated, mismatch repair deficient, and TP53-abnormal types, which added further predictive value. Yet traditional models based solely on these clinical and molecular surrogates have typically yielded area under curve (AUC) values under 0.70 and lacked external validation.
Gonzalez Bosquet and colleagues aimed to address these gaps by training, validating, and testing recurrence predictive models using both machine learning and deep. Their retrospective, case-control study utilized the Oncology Research Information Exchange Network (ORIEN) endometrial cancer dataset, which included 892 patients with an average follow-up period of 31 months. Within the ORIEN data, 186 patients (20.8 percent) experienced recurrence of endometrial cancer.
Risk-Stratified Modeling by Design
Patients were stratified into three biologically distinct groups to preserve signal specificity within each phenotype:
- Low risk: FIGO stage I, grade 1–2 endometrioid histology (n = 329; 11.6 percent recurrence)
- High risk: FIGO stage II–IV or grade 3 endometrioid histology (n = 324; 21.3 percent recurrence)
- Nonendometrioid: Including serous, clear cell, carcinosarcoma, undifferentiated, or mixed histologies (n = 239; 33.1 percent recurrence)
Initial clinical-only models achieved modest discrimination:
- Low-risk: AUC 0.56
- High-risk: AUC 0.70
- Nonendometrioid: AUC 0.65
In contrast, integrated genomic-clinical models performed markedly better. The machine learning models evaluated over 150,000 variables, including gene expression, isoform expression, long non-coding RNA, microRNA, pseudogene expression, single nucleotide variation, copy number variation, and fusion transcripts.
- Low-risk: Top-performing models included copy number variation and clinical data (AUC 0.97; P < .001). In low-risk patients, body mass index emerged as the only clinical predictor of recurrence (OR 1.06).
- High-risk: Models incorporating pseudogene expression and single nucleotide variation achieved AUCs ranging from 0.92 up to 0.98 (P < .001). In high-risk endometrioid cases, serum albumin, bilirubin, red blood cell distribution width, and Hispanic ethnicity were significantly associated with recurrence risk.
- Nonendometrioid: Top predictive models included features such as SAMM50 and SELENOH variants, reaching AUCs of 0.96–0.98 (P < .001). For nonendometrioid cancers, histologic subtype and FIGO stage remained key predictors.
In total, over 300 model combinations were evaluated. Consistent predictors across multiple models included pseudogene expression, altered DNA repair pathway components, and mitochondrial function regulators associated with mitophagy.
However, when tested for external validation using the TCGA endometrial cancer dataset, performance of the top ORIEN-trained models dropped notably:
- Low-risk: highest AUC achieved 0.58 (95 percent CI, 0.43–0.72)
- High-risk: highest AUC achieved 0.54 (95 percent CI, 0.45–0.64)
The authors cite substantial variable mismatch between datasets, lower recurrence incidence in TCGA, and technological disparities as key limitations. Modified “relearned” models, adapted for TCGA feature availability, also failed to recapture the original predictive accuracy.
Clinical and Translational Relevance
The results underscore a crucial insight: the integration of complex genomic features significantly improves upon the prognostic power of clinical data alone. If these integrated machine learning models can achieve validation with prospective genomic and clinical data, they offer a foundation for precision risk prediction in endometrial cancer, enabling a shift from reactive to proactive post-treatment surveillance.
Reference:
- Gonzalez Bosquet J, Polio A, George E, et al. Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence. JCO Precis Oncol. 2025;9:e2400859. doi:10.1200/PO-24-00859
Ready to Claim Your Credits?
You have attempts to pass this post-test. Take your time and review carefully before submitting.
Good luck!