Unsupervised Multi-Latent Space RL Framework for Video Summarization in Ultrasound Imaging

被引:2
|
作者
Mathews, Roshan P. [1 ,2 ]
Panicker, Mahesh Raveendranatha [1 ]
Hareendranathan, Abhilash R. [3 ]
Chen, Yale Tung [4 ]
Jaremko, Jacob L.
Buchanan, Brian [5 ]
Narayan, Kiran Vishnu [6 ]
Kesavadas, C. [7 ]
Mathews, Greeta [8 ]
机构
[1] Indian Inst Technol Palakkad, Ctr Computat Imaging, Dept Elect Engn, Kozhippara, India
[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
[3] Univ Alberta, Radiol & Diagnost Imaging Dept, Edmonton, AB, Canada
[4] Hosp Univ Puerta Hierro, Majadahonda, Spain
[5] Univ Alberta, Crit Care Med Dept, Edmonton, AB, Canada
[6] Govt Med Coll, Thiruvananthapuram, India
[7] Sree Chitra Tirunal Inst Med Sci & Technol, Thiruvananthapuram, India
[8] Bhagwan Mahaveer Jain Hosp, Radiol Dept, Bangalore, India
关键词
Ultrasound; video summarization; unsupervised reinforcement learning; attention ensemble encoders; CLASSIFICATION;
D O I
10.1109/JBHI.2022.3208779
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans and provide clinicians with fast access to relevant information. To this end, we propose a new unsupervised reinforcement learning (RL) framework with novel rewards to facilitate unsupervised learning by avoiding tedious and impractical manual labelling for summarizing ultrasound videos. The proposed framework is capable of delivering video summaries with classification labels and segmentations of key landmarks which enhances its utility as a triage tool in the emergency department (ED) and for use in telemedicine. Using an attention ensemble of encoders, the high dimensional image is projected into a low dimensional latent space in terms of: a) reduced distance with a normal or abnormal class (classifier encoder), b) following a topology of landmarks (segmentation encoder), and c) the distance or topology agnostic latent representation (autoencoders). The summarization network is implemented using a bi-directional long short term memory (Bi-LSTM) which utilizes the latent space representation from the encoder. Validation is performed on lung ultrasound (LUS), that typically represent potential use cases in telemedicine and ED triage acquired from different medical centers across geographies (India and Spain). The proposed approach trained and tested on 126 LUS videos showed high agreement with the ground truth with an average precision of over 80% and average F-1 score of well over 44 +/- 1.7%. The approach resulted in an average reduction in storage space of 77% which can ease bandwidth and storage requirements in telemedicine.
引用
收藏
页码:227 / 238
页数:12
相关论文
共 23 条
  • [21] Multimodal Ultrasound Imaging Based Diagnosis Of Liver Cancers With A Two-Stage Multi-View Learning Framework
    Qian, Yiyi
    Shi, Jun
    Zheng, Xiao
    Zhang, Qi
    Guo, Lehang
    Wang, Dan
    Xu, Huixiong
    [J]. 2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 3232 - 3235
  • [22] Space-time super-resolution for satellite video: A joint framework based on multi-scale spatial-temporal transformer
    Xiao, Yi
    Yuan, Qiangqiang
    He, Jiang
    Zhang, Qiang
    Sun, Jing
    Su, Xin
    Wu, Jialian
    Zhang, Liangpei
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 108
  • [23] Unsupervised classification of multi-contrast magnetic resonance histology of peripheral arterial disease lesions using a convolutional variational autoencoder with a Gaussian mixture model in latent space: A technical feasibility study
    Csore, Judit
    Roy, Trisha L.
    Wright, Graham
    Karmonik, Christof
    [J]. COMPUTERIZED MEDICAL IMAGING AND GRAPHICS, 2024, 115