Automatic bridge inspection database construction through hybrid information extraction and large language models

被引:0
|
作者
Zhang, Chenhong [1 ]
Lei, Xiaoming [2 ]
Xia, Ye [1 ,3 ]
Sun, Limin [1 ,3 ]
机构
[1] Tongji Univ, Dept Bridge Engn, Shanghai, Peoples R China
[2] Hong Kong Polytech Univ, Dept Civil & Environm Engn, Hong Kong, Peoples R China
[3] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Bridge inspection data; Natural language processing; Information extraction; Large languge model; Pseudo label;
D O I
10.1016/j.dibe.2024.100549
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Regular bridge inspections generate extensive reports that, while critical for maintenance, often remain underutilized due to their unstructured format. Traditional information extraction methods depend on intricate labeling systems that commonly require time-consuming and labor-intensive labeling. This paper presents a novel bridge inspection database construction method leveraging LLM-assisted information extraction. First, we introduce the pseudo-labelling method using a closed-source LLM to generate high-quality data. Then we propose the hybrid extraction pipeline to extract relevant information segments and process them by a generation-based IE model, fine-tuned on pseudo-labeled data. Finally, the extracted data is used to construct the bridge inspection database. The proposed method, validated with real-world data, not only demonstrates higher extraction precision than the closed-source LLM used for pseudo-labeling but also outperforms traditional methods in both data preparation time and extraction accuracy. This approach provides a scalable solution for more proactive and data-driven bridge maintenance strategies.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Large language models for generative information extraction: a survey
    Xu, Derong
    Chen, Wei
    Peng, Wenjun
    Zhang, Chao
    Xu, Tong
    Zhao, Xiangyu
    Wu, Xian
    Zheng, Yefeng
    Wang, Yang
    Chen, Enhong
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [2] Extraction of Subjective Information from Large Language Models
    Kobayashi, Atsuya
    Yamaguchi, Saneyasu
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1612 - 1617
  • [3] Exploring the new frontier of information extraction through large language models in urban analytics
    Crooks, Andrew
    Chen, Qingqing
    ENVIRONMENT AND PLANNING B-URBAN ANALYTICS AND CITY SCIENCE, 2024, 51 (03) : 565 - 569
  • [4] Large Language Models for Few-Shot Automatic Term Extraction
    Banerjee, Shubhanker
    Chakravarthi, Bharathi Raja
    McCrae, John Philip
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 137 - 150
  • [5] LEVERAGING LARGE LANGUAGE MODELS FOR ENHANCED CONSTRUCTION SAFETY REGULATION EXTRACTION
    Tran, Si Van-Tien
    Yang, Jaehun
    Hussain, Rahat
    Khan, Nasrullah
    Kimito, Emmanuel Charles
    Pedro, Akeem
    Sotani, Mehrtash
    Lee, Ung-Kyun
    Park, Chansik
    JOURNAL OF INFORMATION TECHNOLOGY IN CONSTRUCTION, 2024, 29 : 1026 - 1038
  • [6] Automatic readability assessment for sentences: neural, hybrid and large language models
    Liu, Fengkai
    Jin, Tan
    Lee, John S. Y.
    LANGUAGE RESOURCES AND EVALUATION, 2025,
  • [7] Enhancing Visual Information Extraction with Large Language Models Through Layout-Aware Instruction Tuning
    Li, Teng
    Wang, Jiapeng
    Jin, Lianwen
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VII, 2025, 15037 : 276 - 289
  • [8] Structured information extraction from scientific text with large language models
    John Dagdelen
    Alexander Dunn
    Sanghoon Lee
    Nicholas Walker
    Andrew S. Rosen
    Gerbrand Ceder
    Kristin A. Persson
    Anubhav Jain
    Nature Communications, 15
  • [9] Exploring Large Language Models for Low-Resource IT Information Extraction
    Bhavya, Bhavya
    Isaza, Paulina Toro
    Deng, Yu
    Nidd, Michael
    Azad, Amar Prakash
    Shwartz, Larisa
    Zhai, ChengXiang
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1203 - 1212
  • [10] Structured information extraction from scientific text with large language models
    Dagdelen, John
    Dunn, Alexander
    Lee, Sanghoon
    Walker, Nicholas
    Rosen, Andrew S.
    Ceder, Gerbrand
    Persson, Kristin A.
    Jain, Anubhav
    NATURE COMMUNICATIONS, 2024, 15 (01)