FF-BERT: A BERT-based ensemble for automated classification of web-based text on flash flood events

被引:4
|
作者
Wilkho, Rohan Singh [1 ]
Chang, Shi [2 ]
Gharaibeh, Nasir G. [1 ]
机构
[1] Texas A&M Univ, Zachry Dept Civil & Environm Engn, College Stn, TX 77840 USA
[2] Trimble Inc, Westminster, CO 80021 USA
关键词
Flash flood; Text classification; Multi-label text classification; BERT;
D O I
10.1016/j.aei.2023.102293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The web is a rich information repository that can be mined to uncover additional data about past flash flood (FF) events, currently missing from existing structured databases. However, this information originates from multiple sources (news articles, government records, and weather records among others) and may cover several topics. Furthermore, these topics may be disproportionately covered on the web. The large size and heterogenous nature of web information render manual review difficult. To address this challenge, we have developed a multi-label text classification model, FF-BERT. FF-BERT is designed to classify FF-related web paragraphs into one or more of seven categories: (1) Damage and Economic Impact (DI), (2) Fatalities, Injuries, and Rescue (FIR), (3) Hydrometeorology (HM), (4) Warning and Emergency (WE), (5) Response and Recovery (RR), (6) Public Health (PH), and (7) Mitigation (MG). To develop FF-BERT, we labeled 21,180 paragraphs from FF-related webpages and performed experiments with multiple model architectures based on the widely used language model Bidirectional Encoder Representation from Transformers (BERT). Our final model outperforms the baseline by 11.83%, as measured by the micro-F1 score. In addition, FF-BERT significantly improves the prediction of minority labels (RR-32.1%, PH-260.4%, and MG-138.6%). We demonstrate using real world examples that FF-BERT can be used to uncover new information about flash flood events. This information can be used to enhance existing databases, such as NOAA's Storm Events Database.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] BERT-based semi-supervised domain adaptation for disastrous classification
    Wang, Jing
    Wang, Kexin
    MULTIMEDIA SYSTEMS, 2022, 28 (06) : 2237 - 2246
  • [32] BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection
    Yun, Sanggeon
    Kang, Seungshik
    Kim, Hyeokman
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2023, 19 (05): : 641 - 651
  • [33] Biomedical Text NER Tagging Tool with Web Interface for Generating BERT-Based Fine-Tuning Dataset
    Park, Yeon-Ji
    Lee, Min-a
    Yang, Geun-Je
    Park, Soo Jun
    Sohn, Chae-Bong
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [34] BERT-based Transfer Learning in Sentence-level Anatomic Classification of Free-Text Radiology Reports
    Nishigaki, Daiki
    Suzuki, Yuki
    Wataya, Tomohiro
    Kita, Kosuke
    Yamagata, Kazuki
    Sato, Junya
    Kido, Shoji
    Tomiyama, Noriyuki
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2023, 5 (02)
  • [35] BERT-based Regression Model for Micro-edit Humor Classification Task
    Chen, Yuancheng
    Hou, Yi
    Ye, Deqiang
    Yu, Yuehang
    2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
  • [36] The Automatic Text Classification Method Based on BERT and Feature Union
    Li, Wenting
    Gao, Shangbing
    Zhou, Hong
    Huang, Zihe
    Zhang, Kewen
    Li, Wei
    2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 774 - 777
  • [37] Cross-Domain Text Classification Based on BERT Model
    Zhang, Kuan
    Hei, Xinhong
    Fei, Rong
    Guo, Yufan
    Jiao, Rui
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS: DASFAA 2021 INTERNATIONAL WORKSHOPS, 2021, 12680 : 197 - 208
  • [38] Chinese Text Classification Method Based on BERT Word Embedding
    Wang, Ziniu
    Huang, Zhilin
    Gao, Jianling
    2020 5TH INTERNATIONAL CONFERENCE ON MATHEMATICS AND ARTIFICIAL INTELLIGENCE (ICMAI 2020), 2020, : 66 - 71
  • [39] Text Classification Research Based on Bert Model and Bayesian Network
    Liu, Songsong
    Tao, Haijun
    Feng, Shiling
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 5842 - 5846
  • [40] Emotion Classification of Text Based on BERT and Broad Learning System
    Peng, Sancheng
    Zeng, Rong
    Liu, Hongzhan
    Chen, Guanghao
    Wu, Ruihuan
    Yang, Aimin
    Yu, Shui
    WEB AND BIG DATA, APWEB-WAIM 2021, PT I, 2021, 12858 : 382 - 396