Automatic Quality Control of Transportation Reports Using Statistical Language Processing

被引:3
|
作者
Gerber, Matthew S. [1 ]
Tang, Lu [2 ]
机构
[1] Univ Virginia, Dept Syst & Informat Engn, Charlottesville, VA 22904 USA
[2] Univ Virginia, Dept Stat, Charlottesville, VA 22904 USA
关键词
Natural language processing (NLP); quality control; transportation reports; SEARCH;
D O I
10.1109/TITS.2013.2265892
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The processes of developing, monitoring, and maintaining transportation systems produce large volumes of information. Human fieldworkers are often responsible for gathering this information, and despite their best efforts, they will inevitably introduce errors into the collected data. This is a critical problem since: 1) the collected data are used to justify key infrastructure maintenance and development decisions; and 2) the volume of unstructured information (e. g., plain text) makes manual quality control prohibitively expensive. We introduce a solution to this problem in the example domain of vehicle accident reports. First, we analyzed a sample of accident reports and confirmed the existence of many data entry errors. Second, we developed and evaluated a statistical language processing approach that automatically identifies reports containing data entry errors. We tested a variety of system configurations on real-world data and compared their performance with multiple baseline methods. The best configuration achieved a performance score of 84%, far outperforming the baseline methods. Our results and analyses have quality control implications for any data source that pairs structured text (e. g., coded fields) with unstructured text.
引用
收藏
页码:1681 / 1689
页数:9
相关论文
共 50 条
  • [21] Automatic Processing of Anatomic Pathology Reports in the Italian Language to Enhance the Reuse of Clinical Data
    Viani, Natalia
    Chiudinelli, Lorenzo
    Tasca, Cristina
    Zambelli, Alberto
    Bucalo, Mauro
    Ghirardi, Arianna
    Barbarini, Nicola
    Sfreddo, Eleonora
    Sacchi, Lucia
    Tondini, Carlo
    Bellazzi, Riccardo
    BUILDING CONTINENTS OF KNOWLEDGE IN OCEANS OF DATA: THE FUTURE OF CO-CREATED EHEALTH, 2018, 247 : 715 - 719
  • [22] Automatic generation of conclusions from neuroradiology MRI reports through natural language processing
    Lopez-Ubeda, Pilar
    Martin-Noguerol, Teodoro
    Escartin, Jorge
    Luna, Antonio
    NEURORADIOLOGY, 2024, 66 (04) : 477 - 485
  • [23] Detection of duplicate defect reports using Natural Language Processing
    Runeson, Per
    Alexandersson, Magnus
    Nyholm, Oskar
    ICSE 2007: 29TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2007, : 499 - +
  • [24] USE OF COMPUTERS - STATISTICAL CONTROL OF QUALITY IN FRAME OF AUTOMATIC CONTROL OF CHEMICAL PROCESSES
    GIMPEL, F
    CHIMICA & L INDUSTRIA, 1970, 52 (07): : 719 - &
  • [25] Variability and automatic language processing
    Grezka, Aude
    LINGUISTICAE INVESTIGATIONES, 2020, 43 (02): : 280 - 299
  • [26] Thesauri and Automatic Language Processing
    Da Sylva, Lyne
    DOCUMENTATION ET BIBLIOTHEQUES, 2006, 52 (02): : 149 - 156
  • [27] Automatic language processing in question
    Haralambous, Yannis
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2022, 63 (02):
  • [28] Identifying Transportation Needs in Ophthalmology Encounters Using Natural Language Processing
    Tallapaneni, Pooja Sree
    Wasser, Lauren M.
    Cassidy, Julie
    Osterhoudt, Hunter
    Wang, Yanshan
    Williams, Andrew M.
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [29] USING NATURAL LANGUAGE PROCESSING FOR AUTOMATIC EXTRACTION OF ONTOLOGY INSTANCES
    Faria, Carla
    Girardi, Rosario
    Serra, Ivo
    Macedo, Maria
    Maranhao, Djefferson
    ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2010, : 278 - 283
  • [30] Automatic Extraction of Legal Citations using Natural Language Processing
    Gheewala, Akshita
    Turner, Chris
    de Maistre, Jean-Remi
    WEBIST: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, : 202 - 209