UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation

Zehui Lin1
Zhuoneng Zhang1
Xindi Hu2
Zhifan Gao3
Xin Yang4

Yue Sun1
Dong Ni4
Tao Tan1,

1Faculty of Applied Sciences, Macao Polytechnic University
2Shenzhen RayShape Medical Technology Co. Ltd.
3School of Biomedical Engineering, Sun Yat-sen University
4School of Biomedical Engineering, Shenzhen University


Abstract

Ultrasound is widely used in clinical practice due to its affordability, portability, and safety. However, current AI research often overlooks combined disease prediction and tissue segmentation. We propose UniUSNet, a universal framework for ultrasound image classification and segmentation. This model handles various ultrasound types, anatomical positions, and input formats, excelling in both segmentation and classification tasks. Trained on a comprehensive dataset with over 9.7K annotations from 7 distinct anatomical positions, our model matches state-of-the-art performance and surpasses single-dataset and ablated models. Zero-shot and fine-tuning experiments show strong generalization and adaptability with minimal fine-tuning. We plan to expand our dataset and refine the prompting mechanism, with model weights and code available at GitHub.


The BroadUS-9.7K Dataset



The BroadUS-9.7K dataset contains 9.7K annotations of 6.9K ultrasound images from 7 different anatomical positions. (a) The number of effective instances corresponding to nature of the image, position, task and input type. Note that a breast image can contain both segmentation and classification labels, and an image with segmentation labels can form three different input types (b) The different anatomical positions, and their corresponding public dataset abbreviations.


Architecture



The architecture of UniUSNet is a general encoder-decoder model that uses prompts to simultaneously handle multiple ultrasound tasks like segmentation and classification. The encoder extracts features, while task-specific decoders are enhanced by four types of prompts—nature, position, task, and type—added to each transformer layer via prompt projection embedding, boosting the model’s versatility and performance.


Results

R1: OVERALL PERFORMANCE COMPARISON ON BOARDUS-9.7K DATASET.

SAM’s official weights perform poorly in zero-shot inference (37.12%) due to the domain gap between natural and medical images. SAMUS improves performance (80.65%) but doesn’t surpass the Single model, likely due to dataset heterogeneity. Our automatic prompt model, with 66% fewer parameters, achieves similar segmentation results (80.01%). Ablation studies reveal that UniUSNet (79.89%) outperforms both the ablation version (78.46%) and Single (78.43%) models, proving the effectiveness of prompts. While UniUSNet and UniUSNet w/o prompt models have fewer parameters, they excel in classification over segmentation, possibly due to the network’s multi-branch structure, suggesting a need for more balanced learning.

R2: Some examples of segmentation result. Each column From left to right: original image, SAM, SAMUS, Single, UniUSNet w/o prompt, Prompt and ground truth.

Segmentation results reveal that UniUSNet outperforms SAM and other models by effectively using nature and position prompts for deeper task understanding.

R3: t-SNE visualization.

We visualized feature distributions of the BUS-BRA, BUSIS, and UDIAT datasets. The Figure shows that the Single model has a clear domain shift, while UniUSNet w/o prompt reduces this shift, indicating better domain adaptation. Prompts further minimize the domain offset.

R4: ADAPTER PERFORMANCE COMPARISON ON BUSI DATASET.

The table shows that UniUSNet w/o prompt and UniUSNet outperform the Single model, demonstrating better generalization and prompt effectiveness. Additionally, the Adapter setup, with minimal fine-tuning, surpasses the Scratch setup, showcasing our model’s adaptability to new datasets efficiently.


Video



Publications

UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation.
Zehui Lin, Zhuoneng Zhang, Xindi Hu, Zhifan Gao, Xin Yang, Yue Sun, Dong Ni, Tao Tan
BIBM, 2024





Support

We provide a detailed data processing method for the BroadUS-9.7K dataset (link), as well as a data demo for checking whether the data format is prepared properly and for quickly starting experiments or inferences (link).

Pretrained models can be downloaded here (link).




Acknowledgements

This work was supported by Science and Technology Development Fund of Macao (0021/2022/AGJ) and Science and Technology Development Fund of Macao (0041/2023/RIB2).



Webpage template modified from Richard Zhang.