Diagnostic Performance of Universal-Learning Ultrasound AI Across Multiple Organs and Tasks: the UUSIC25 Challenge

Zehui Lin1
Luyi Han2
Xin Wang2
Ying Zhou1,3
Yanming Zhang1,3
Tianyu Zhang2
Lingyun Bao4
Jiarui Zhou1
Yue Sun1
Jieyun Bai6
Shuo Li7
Shuo Li7
Dong Ni8
Ritse Mann2
Wendie Berg10
Dong Xu3,
Tao Tan1,
and the UUSIC25 Challenge Consortium

1Macao Polytechnic University, Macao, China
2Netherlands Cancer Institute, Amsterdam, The Netherlands
3Zhejiang Cancer Hospital, Hangzhou, China
4The First People’s Hospital of Hangzhou, Affiliated Hangzhou Hospital of Nanjing Medical University
5Department of Radiology, University of Pittsburgh, PA, USA
6College of Information Science and Technology, Jinan University, Guangzhou, China
7Case Western Reserve University, USA
8School of Biomedical Engineering, Shenzhen University, Shenzhen, China
10University of Pittsburgh Medical Center, Pittsburgh, PA, USA


Abstract

Modern ultrasound systems are universal, economically cost-effective, and portable diagnostic tools capable of imaging the entire body. However, current artificial intelligence (AI) solutions remain fragmented into single-task, organ-specific tools. This critical gap between hardware versatility and software specificity limits workflow integration and the clinical utility of AI in general ultrasonography. To address this, we organized the Universal UltraSound Image Challenge 2025 (UUSIC25). Participants developed algorithms using a training set of 11,644 images aggregated from 12 discrete sources. Algorithms were evaluated on a fully independent, multi-center private test set of 2,479 images, composed of held-out internal samples and a cohort from an external center completely unseen during training to assess generalization. The top-ranking algorithm (SMART) achieved a macro-averaged DSC of 0.854 across 5 segmentation tasks and a macro-averaged AUC of 0.766 for binary classification tasks. While models demonstrated high capability in anatomical segmentation, performance variability was observed in complex diagnostic tasks subject to domain shift. General-purpose AI models can achieve high diagnostic accuracy and efficiency across multiple ultrasound tasks using a single network architecture. However, significant performance degradation on data from unseen institutions suggests that future development must prioritize domain generalization techniques before clinical deployment is feasible.


Challenge Design and Data



Figure 1. Study Flow Diagram. Data collection, source attribution, and stratification logic. The diagram illustrates the explicit separation of data streams: public datasets were utilized exclusively for training to promote generalization (n=10,010), while internal private data were stratified across all sets (n=5,499). Data from the external center (NKI, n=512) served as a strictly held-out test set.


Key Findings

Question: How effectively can a single general-purpose deep learning model handle multi-organ segmentation and classification tasks in clinical ultrasound?

Findings: In the UUSIC25 challenge involving 15 algorithms and 16,021 images across 7 anatomical regions, the winning query-driven Transformer model achieved high diagnostic accuracy (e.g., 0.942 Dice for fetal head segmentation, 0.837 AUC for breast malignancy) and efficiency. Notably, these unified models demonstrated robust generalization on a fully private, multi-center test set containing data from a completely unseen institution.

Meaning: These results suggest that developing high-performing, "all-in-one" clinical ultrasound AI systems is feasible, moving beyond the fragmented single-task paradigm; this paves the way for next-generation AI assistants that can streamline workflows and adapt to diverse clinical scenarios without manual intervention.


Results

Global Landscape and Performance Benchmarking

Figure 2. (a) Global Participation & Data Integration Network spanning five continents. (b) Methodological Configuration Matrix summarizing architectural choices for the top-10 teams. (c) Multi-Organ Diagnostic Versatility and Efficiency Pane showing segmentation (DSC) and classification (AUC) performance across anatomical regions. The winning model (SMART) demonstrates consistent coverage across tasks. (d) Diagnostic Precision (ROC Curves) for Breast Malignancy and Fatty Liver tasks on the private test set.



Publications

Diagnostic Performance of Universal-Learning Ultrasound AI Across Multiple Organs and Tasks: the UUSIC25 Challenge
Zehui Lin, Luyi Han, Xin Wang, Ying Zhou, Yanming Zhang, Tianyu Zhang, Lingyun Bao, Jiarui Zhou, Yue Sun, Jieyun Bai, Shuo Li, Shandong Wu, Dong Ni, Ritse Mann, Wendie Berg, Dong Xu, Tao Tan and the UUSIC25 Challenge Consortium
arXiv, 2026





Acknowledgements

This work was supported by the Science and Technology Development Fund, Macau SAR (File no. 0004/2025/ASJ) under the FDCT-FAPESP Joint Funding Scheme; the Shenzhen Medical Research Fund (Grant No. D2501013); and the Macao Polytechnic University Grant (Grant No. RP/FCA-17/2025). We thank the organizing committee of MICCAI 2025 for hosting the challenge.



Webpage template modified from Richard Zhang.