Original articles
Vol. 117: Issue 5 - October 2025
AI-assisted sentinel lymph node examination and metastatic detection in breast cancer: the potential of ChatGPT for digital pathology research
Summary
Objective. Traditional pathological examination of lymph nodes is labor-intensive and has shown variability in diagnostic accuracy. Recent advancements in artificial intelligence (AI) provide promising opportunities to enhance and standardize pathological workflows. AI-based image analysis models, particularly those utilizing deep learning algorithms, have demonstrated potential in automating and improving diagnostic accuracy in histopathology. This study aimed to evaluate the performance of a novel AI model known as ChatGPT-4 in detecting metastatic involvement in sentinel lymph nodes (SLNs) from breast cancer cases.
Methods. We utilized digital slides from frozen sections, which are commonly employed intraoperatively, to assess the model’s diagnostic accuracy. A total of 90 SLNs were retrospectively collected and analyzed using ChatGPT-4. The generated diagnoses were evaluated by two senior pathologists.
Results. The AI model achieved an overall accuracy of 92.2%, with a sensitivity of 100% and specificity of 80.6%. The study highlights the practical applicability of AI in diagnosing SLN metastasis, emphasizing the importance of frozen sections in real-world scenarios.
Conclusions. These findings suggest that integrating AI models like ChatGPT-4 into pathological workflows could enhance diagnostic accuracy and efficiency in breast cancer treatment.
Introduction
Breast cancer remains one of the most common malignancies affecting women worldwide 1.
It is well established that the most relevant prognostic factor in breast cancer patients is the extent of disease spread outside the breast, particularly in the axillary lymph nodes 2. In this regard, lymph node involvement serves as a crucial determinant for selecting the most appropriate therapeutic approach in breast cancer patients 3. However, pathological examination of lymph nodes in breast cancer is a labor-intensive and time-consuming process, with studies indicating that the diagnostic performance of pathologists has room for improvement 4.
Recent advancements in artificial intelligence (AI) offer promising avenues to augment and standardize pathological workflows 5,6. AI-based image analysis models, powered by deep learning algorithms, have shown the potential to automate and enhance diagnostic accuracy in histopathology 5,6. An increasing number of studies have focused on the detection of breast cancer metastases within digital whole-slide images of breast lymph nodes stained with hematoxylin and eosin 7-9. Notably, the CAMELYON16 challenge served as a platform for evaluating the efficacy of AI algorithms designed to automate the identification of metastases in these images 6. Among the 32 algorithms assessed, seven surpassed the performance of a panel of 11 pathologists working under time-constrained conditions 6. Furthermore, the most effective AI algorithm demonstrated a level of accuracy exceeding that of the top-performing pathologist within the study 6.
In the present study we aimed to evaluate the performance of an easy-to-use, widely accessible, AI model known as ChatGPT-4. This AI model was tested for its accuracy in detecting metastatic involvement in sentinel lymph nodes from breast cancer cases.
Unlike pathology-specific algorithms, our study deliberately employed a general-purpose multimodal large language model. This exploratory choice was intended not to replicate the benchmarks of specialized tools, but rather to investigate whether a generalistic AI model could approach histopathological interpretation differently. In this sense, we consider our approach innovative and complementary, opening a distinct line of inquiry alongside established, domain-specific systems.
Materials and methods
AI MODEL DESCRIPTION
The AI model utilized in this study, named ChatGPT-4, was developed by OpenAI.
This AI model has an integrated functionality called GPT-4 with Vision (GPT-4V) that allows users to directly upload and analyze images, including medical images such as histopathology slides.
ChatGPT-4 is based on a transformer architecture. Unlike specialized deep-learning-based histopathology models that rely on CNNs for feature extraction and image classification, GPT-4V employs multimodal transformer-based processing to analyze and interpret images.
Case selection
A total of 90 sentinel lymph nodes (SLNs) from an equal number of breast carcinoma resection specimens were retrospectively collected from the archives of our institution. Of these, 54 cases were confirmed to be metastatic, and 36 were negative for metastatic involvement. Among the metastatic cases, 33 were classified as macrometastases (≥2 mm) and 21 as micrometastases (0.2–2 mm). The study cohort included 77 cases of ductal carcinoma NOS, of which 48 were metastatic, and 13 cases of lobular carcinoma, with 6 showing lymph node metastases. Within the ductal carcinoma group, 30 cases presented macrometastases, while 18 were micrometastases. In the lobular carcinoma group, 4 cases had macrometastases, and 2 had micrometastases. No cases with isolated tumor cells (ITCs) were included in this study.
The study was conducted with the use of frozen section photomicrographs taken from sentinel axillary lymph nodes. All cases were anonymized, and each image was digitized at 10X or 20X magnification for analysis.
The histological slides were digitized using Aperio Scanscope with a 20X objective lens, with a pixel size of 0.5 μm/pixel.
Since GPT-4 currently does not support full whole-slide image uploads, pre-selected regions of interest (ROI) were uploaded, rather than entire scanned slides.
ROIs were selected based on pathologist review to ensure that representative areas containing lymph node tissue were analyzed.
The images analyzed in this study were small, de-identified ROIs of lymph-node parenchyma, stripped of all labels, barcodes, and metadata, and thus impossible to link to an individual patient. Although inference was performed via the ChatGPT online interface, the platform does not function as a public cloud repository but rather as a vendor-hosted inference environment. When applied under appropriate safeguards, its data-handling profile is comparable to other commercial AI tools currently employed in digital pathology research.
These images were analysed by ChatGPT powered by GPT-4. The images were uploaded to ChatGPT (version GPT-4, assessed 10 November 2023); the following prompt was utilized: ‘The images provided show histological frozen sections from a sentinel lymph node taken from a patient with invasive carcinoma of the breast. Provide a potential diagnosis and suggest if the lymph node is positive or negative for metastatic involvement’.
The generated responses were afterward assessed by two senior pathologists to evaluate GPT-4’s descriptive accuracy and proficiency in histopathological diagnosis.
Ai-generated outputs
The following step-by step procedure was utilized to record AI-generated results:
- The AI model provided direct classifications based on uploaded histological images without external manipulation. The responses were recorded as raw outputs before any human interpretation.
- The AI’s assessments were categorized into predefined “positive” or “negative” outputs, avoiding ambiguous or subjective classifications.
- The AI-generated outputs were blindly compared to the gold standard pathology report without human intervention in modifying the AI’s response. This approach ensured that any bias in interpretation was minimized.
- To maintain objectivity, the AI’s responses were recorded prior to comparison with pathology diagnoses, ensuring that no retrospective bias was introduced by human interpretation.
PERFORMANCE EVALUATION
We calculated the AI model’s diagnostic accuracy, sensitivity, specificity, and positive predictive value (PPV) based on the results of these 90 cases. The reference standards were the final histopathological diagnoses verified by experienced pathologists.
Results
A total of 90 histological cases were submitted for evaluation using the AI model. Of these, 54 cases were confirmed to be metastatic, and 36 were confirmed to be negative for metastatic involvement. The AI model classified 61 cases as metastatic and 29 as negative. Clinicopathological features have been summarized in Table I. The average reading time of AI model was 2.53 seconds per uploaded image.
All 54 metastatic cases were correctly identified, achieving a sensitivity of 100% (Fig. 1). However, 7 negative cases (19.4% of all negative cases) were incorrectly classified as metastatic, resulting in a specificity of 80.6%. Overall, the AI model correctly classified 83 of the 90 cases, achieving an accuracy of 92.2%. The false-positive rate was 7.8%, with all misclassifications involving negative cases being interpreted as metastatic. Since GPT-4 is unable to generate heatmaps or highlight specific regions within the image, false-positive cases were retrospectively reviewed by two senior pathologists, who identified likely sources of misclassification. The false positivity was attributed to crushed lymphocytes and plasma cells (4 cases), histiocytes (1 case) and blood vessels (2 cases) (Fig. 2).
The preanalytical factors like variation in tissue processing, air bubbles, floaters, necrosis, and other artifacts did not seem to influence the effectiveness of the app.
False positive cases were then re-analysed by submitting permanent section photomicrographs of the same lymph-nodes. In this second analysis, the AI model correctly identified 5/7 cases. In detail, two cases showing a proliferation of blood vessels and crushed lymphocytes were still interpreted as metastatic while the remaining 5 cases were correctly interpreted as negative.
Discussion
Large Language Models (LLMs), advanced AI models trained on vast text datasets, have shown remarkable progress across diverse applications, including question answering, translation, and coding10. Among these models, ChatGPT has emerged as one of the most reliable and widely accessible chatbots 11. While ChatGPT has been extensively studied in healthcare, its application in pathology, particularly in diagnosing SNLs from images, has not been previously investigated 12-18.
This study explores the use of ChatGPT-4 to facilitate SLN screening, thereby enhancing diagnostic accuracy and improving the efficiency of pathologists.
The results of our study underscore the effectiveness of ChatGPT-4 in identifying metastatic involvement in sentinel lymph nodes from breast cancer cases. The model demonstrated high sensitivity, indicating that it successfully identified all true positive cases without missing metastatic foci. Although the specificity of 80.6% suggests that some benign cases were incorrectly flagged as positive, this trade-off is consistent with AI applications in diagnostics, prioritizing sensitivity to avoid false negatives.
Deep learning algorithms have already been successfully applied in several fields of medicine, including lymph nod metastasis detection 5-9. These innovative tools have demonstrated considerable potential, as evidenced by their performance in the CAMELYON challenges, where AI systems achieved diagnostic accuracy comparable to expert breast pathologists operating without time constraints and exceeded the performance of general pathologists under time pressures 6.
In the study by Steiner et al, the LYNA (Lymph Node Assistant) algorithm analyzed 70 whole-slide images (WSIs) showing improved accuracy and sensitivity for detecting micrometastases 19.
Recently, Challa et al. reported a large-scale study on AI-assisted LN metastasis detection using hematoxylin and eosin-stained slides from breast cancer specimens9. The AI tool demonstrated an overall sensitivity and negative predictive value (NPV) of 100%, positioning it as a potential screening tool preceding pathologists’ review 9.
Compared to the most relevant studies on this topic such as LYNA, and CAMELYON16, several differences in sensitivity, specificity, accuracy, and false positives emerged 6,19. Our study achieved 100% sensitivity, outperforming LYNA’s 91% sensitivity. However, our specificity was lower (80.6%) compared to LYNA’s 96%, with a higher false-positive rate (7.8%), where all misclassified cases were negative LNs interpreted as metastatic. In contrast, LYNA’s false positives (4%) were primarily due to benign glandular inclusions and histiocytes, while our misclassifications were caused by crushed lymphocytes (4 cases), histiocytes (1 case), and blood vessels (2 cases). CAMELYON16 models demonstrated near-perfect AUCs (~0.99), but sensitivity and specificity varied across different algorithms, with false positives arising from artifacts, germinal centers, and reactive changes6.
In this scenario, ChatGPT-based image analysis stands out as promising alternative to existing algorithms, eliminating the need for specialized training and expensive digital pathology workflows. GPT-4 provides an accessible and readily available AI tool for histopathology interpretation, although it is not yet a fully integrated, automated solution for pathology workflows 12-18. Unlike pathology-specific AI tools that can directly process whole-slide images from scanners and integrate into laboratory information systems, GPT-4 requires manual image export, uploading, and text-based prompting, which introduces additional steps that may not be practical in high-throughput clinical settings 6,9,19. Further development is needed to achieve integration with digital pathology systems and routine clinical practice.
The main strength of our study lies in our exclusive use of frozen sections. This methodological choice distinguishes our research from prior investigations that predominantly employed formalin-fixed paraffin-embedded (FFPE) sections. By focusing on frozen sections, we emphasize the practical applicability of ChatGPT in diagnosing SLN metastasis during intraoperative procedures.
Additionally, preanalytical variables, including tissue processing inconsistencies, air bubbles, floaters, necrosis, and other artifacts, did not significantly affect the tool’s performance, underscoring its robustness and utility.
In our study, only false positive cases were re-analyzed by submitting FFPE tissue section photomicrographs of the same lymph nodes. In this second analysis, the AI model accuracy improved.
We acknowledge that this approach introduces potential bias, as it does not systematically assess whether FFPE processing affects overall AI performance, including the detection of true positives and false negatives. While our findings suggest that higher-quality histological preparation improves specificity by reducing false positives, we cannot exclude the possibility that certain metastatic foci, particularly micrometastases, may be more challenging to detect in FFPE sections due to differences in tissue processing and staining. A more comprehensive evaluation comparing AI performance on both frozen and FFPE sections across all cases is necessary to determine whether FFPE provides a significant diagnostic advantage or introduces new detection challenges.
The study had limitations, including the small sample size and the use of frozen sections which may introduce artifacts impacting the model’s performance. Future research should focus on refining the model’s specificity and expanding the dataset to include a broader range of histological subtypes and tumor stages.
Another limitation of our study is that AI analysis was performed on pre-selected regions of interest rather than full-slide images. This approach, which is often used in studies, may overestimate performance, as it avoids common histological artifacts such as ink contamination, tissue folding, and debris, which are present in real-world diagnostic settings. However, ROI-based analysis carries potential translational strengths. By focusing on diagnostically relevant regions, this approach not only enables feasibility testing with a non-specialized model but may paradoxically approximate certain real-world applications more closely. In low-resource settings, where whole-slide scanners and high-performance computing are unavailable, a simple smartphone or digital camera image of a representative field could be rapidly analyzed by a general-purpose model, providing immediate feedback or second-opinion support. This points to a potential complementary role for ROI-driven AI, distinct from but synergistic with WSI-based pipelines, particularly in democratizing access to diagnostic support. Moreover, at present, ChatGPT-4 is not a certified medical device, and its use in diagnostics remains experimental 12-18. A key challenge in certifying ChatGPT-4 as a medical product is that its underlying model is continuously updated by OpenAI without prior notice, which could lead to performance variability in histopathological tasks such as LN metastasis detection 12-18. Without a locked, validated version, GPT-4 cannot currently meet these regulatory standards. However, it is highly probable that GPT’s performance in histological image analysis will improve significantly in the coming years as AI models become more specialized and pathology datasets are incorporated into their training. Future versions may offer greater accuracy, explainability, and consistency, potentially making them more viable for regulatory approval.
In conclusion, While AI-assisted models can greatly improve diagnostic efficiency and consistency, their implementation should complement rather than replace human expertise. For complex cases or ambiguous findings, the model’s predictions serve as a valuable aid, prompting further examination and reducing the cognitive load on pathologists. Nevertheless, integrating AI in pathology workflows requires careful consideration of inter- and intra-pathologist variability, model validation, and user training.
Conclusion
This study is not a recommendation for clinical implementation of ChatGPT-4 in pathology or any other diagnostic application. Rather, this work serves as an exploratory investigation into the capabilities of ChatGPT-4 in histopathology. The results observed in our study suggest that ChatGPT-4 shows potential for identifying metastatic involvement in sentinel lymph nodes of breast cancer. Further studies on larger cohorts are necessary to confirm our findings, evaluate the model’s performance across diverse datasets, and ensure its reliability in different histopathological settings.
CONFLICTS OF INTEREST STATEMENT
Authors declare no conflict of interest.
FUNDING
None.
AUTHORS’ CONTRIBUTIONS
Conception and design: GA, GFZ. Resources and methodology: GA, NN, AS. Collection and assembly of data: LM, AR, SM. Data analysis and interpretation: FD,GLC, CC. Writing—original draft preparation: AS, SM, GA. Writing—review and editing: GFZ BPU, MV. Supervision: FA, AM. Project administration: GFZ.
ETHICAL CONSIDERATION
The study complied with the Ethical Principles for Medical Research Involving Human Subjects according to the World Medical Association Declaration of Helsinki. The clinical information had been retrieved from the patients’ medical records and pathology reports. Patients’ initials or other personal identifiers did not appear in any image. Finally, all samples were anonymized before histology and immunohistochemistry. Therefore, no further ethical approval was necessary to perform the study.
The non-interventional, retrospective nature of our study did not require any informed consent, even if written informed consent had been obtained from each patient before surgical procedures.
History
Received: February 19, 2025
Accepted: September 23, 2025
Figures and tables
Figure 1. (A-E) Hematoxylin and eosin stained frozen section illustrating metastatic sentinel lymph nodes. These photomicrographs, were correctly classified as “positive for metastatic involvment” by the AI model.
Figure 2. Hematoxylin and eosin stained frozen section illustrating sentinel lymph nodes negative for metastatic involvment (A,C: frozen sections; B,D: permanent sections). (A) Frozen section of a negative sentinel lymph node characterized by a proliferation of blood vessels (arrows) that were interpreted as neoplastic cells by the AI model. (B) The case was re-analysed by submitting permanent section photomicrographs of the same lymph-node. In this second analysis, the proliferation of blood vessels (arrows) was still interpreted as metastatic by the AI model. (C) Frozen section of another false positive case where the presence of blood vessels (arrows) and crushed lymphocytes was incorrectly interpreted as a sign of metastatic involvment by the AI model. (D) Permanent section photomicrograph of the same lymph node was then submitted for a re-analysis. However, the proliferation of blood vessels (arrows) was still interpreted as metastatic by the AI model.
| Case characteristics | Total number |
|---|---|
| Submitted cases | 90 |
| Metastatic cases | 54 |
| Negative cases | 36 |
| Correct AI diagnoses | 83 |
| Incorrect AI diagnoses | 7 |
| Accuracy of AI Model | 92.2% |
| Sensitivity | 100% |
| Specificity | 80.6% |
References
- Guo L, Kong D, Liu J. Breast cancer heterogeneity and its implication in personalized precision therapy. Exp Hematol Oncol. 2023;12. doi:https://doi.org/10.1186/s40164-022-00363-1
- Bitencourt A, Rossi Saccarelli C, Morris E. Regional Lymph Node Involvement Among Patients With De Novo Metastatic Breast Cancer. JAMA Netw Open. 2020;3(10). doi:https://doi.org/10.1001/jamanetworkopen.2020.18790
- Tonellotto F, Bergmann A, de Souza Abrahão K. Impact of Number of Positive Lymph Nodes and Lymph Node Ratio on Survival of Women with Node-Positive Breast Cancer. Eur J Breast Health. 2019;15:76-84. doi:https://doi.org/10.5152/ejbh.2019.4414
- Vestjens J, Pepels M, de Boer M. Ann Oncol. 2012;23:2561-2566. doi:https://doi.org/10.1093/annonc/mds072
- Acs B, Rimm D. Not Just Digital Pathology, Intelligent Digital Pathology. JAMA Oncol. 2018;4:403-404. doi:https://doi.org/10.1001/jamaoncol.2017.5449
- Shafi S, Parwani A. Artificial intelligence in diagnostic pathology. Diagn Pathol. 2023;18. doi:https://doi.org/10.1186/s13000-023-01375-z
- Ehteshami Bejnordi B, Veta M, Johannes van Diest P. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA. 2017;318:2199-2210. doi:https://doi.org/10.1001/jama.2017.14585
- Liu Y, Kohlberger T, Norouzi M. Artificial Intelligence-Based Breast Cancer Nodal Metastasis Detection: Insights Into the Black Box for Pathologists. Arch Pathol Lab Med. 2019;143:859-868. doi:https://doi.org/10.5858/arpa.2018-0147-OA
- Challa B, Tahir M, Hu Y. Artificial Intelligence-Aided Diagnosis of Breast Cancer Lymph Node Metastasis on Histologic Slides in a Digital Workflow. Mod Pathol. 2023;36. doi:https://doi.org/10.1016/j.modpat.2023.100216
- Lobentanzer S, Feng S, Bruderer N. A platform for the biomedical application of large language models. Nat Biotechnol. Published online 2025. doi:https://doi.org/10.1038/s41587-024-02534-3
- Dave T, Athaluri S, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell. 2023;6. doi:https://doi.org/10.3389/frai.2023.1169595
- Fatima A, Shafique M, Alam K. ChatGPT in medicine: A cross-disciplinary systematic review of ChatGPT’s (artificial intelligence) role in research, clinical practice, education, and patient interaction. Medicine (Baltimore). 2024;103. doi:https://doi.org/10.1097/MD.0000000000039250
- Omar M, Ullanat V, Loda M. ChatGPT for digital pathology research. Lancet Digit Health. 2024;6:e595-e600. doi:https://doi.org/10.1016/S2589-7500(24)00114-6
- Schukow C, Smith S, Landgrebe E. Application of ChatGPT in Routine Diagnostic Pathology: Promises, Pitfalls, and Potential Future Directions. Adv Anat Pathol. 2024;31:15-21. doi:https://doi.org/10.1097/PAP.0000000000000406
- Ding L, Fan L, Shen M. Evaluating ChatGPT’s diagnostic potential for pathology images. Front Med (Lausanne). 2025;11. doi:https://doi.org/10.3389/fmed.2024.1507203
- Oon M, Syn N, Tan C, Tan K, Ng S. Bridging bytes and biopsies: A comparative analysis of ChatGPT and histopathologists in pathology diagnosis and collaborative potential. Histopathology. 2024;84:601-613. doi:https://doi.org/10.1111/his.15100
- Hirosawa T, Harada Y, Tokumasu K, Ito T, Suzuki T, Shimizu T. Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration. JMIR Med Inform. 2024;12. doi:https://doi.org/10.2196/55627
- Hirosawa T, Harada Y, Mizuta K, Sakamoto T, Tokumasu K, Shimizu T. Diagnostic performance of generative artificial intelligences for a series of complex case reports. Digit Health. 2024;10. doi:https://doi.org/10.1177/20552076241265215
- Steiner D, MacDonald R, Liu Y. Impact of Deep Learning Assistance on the Histopathologic Review of Lymph Nodes for Metastatic Breast Cancer. Am J Surg Pathol. 2018;42:1636-1646. doi:https://doi.org/10.1097/PAS.0000000000001151
Downloads
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright
Copyright (c) 2025 Società Italiana di Anatomia Patologica e Citopatologia Diagnostica, Divisione Italiana della International Academy of Pathology
How to Cite
- Abstract viewed - 717 times
- PDF downloaded - 345 times

