Mining and tracking massive text data: Classification, construction of tracking statistics, and inference under misclassification

被引:9
|
作者
Jeske, Daniel R. [1 ]
Liu, Regina Y.
机构
[1] Univ Calif Riverside, Dept Stat, Riverside, CA 92521 USA
[2] Rutgers State Univ, Hill Ctr, Dept Stat, Piscataway, NJ 08854 USA
基金
美国国家科学基金会;
关键词
data mining; misclassification; risk indicator; text classification; tracking statistic;
D O I
10.1198/004017006000000471
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This article presents a comprehensive data-mining procedure for exploring large freestyle text datasets to discover useful features and develop suitable tracking statistics (often referred to as performance measures or risk indicators). The procedure includes text classification, construction of tracking statistics, inference under error measurements, and risk analysis. Some specific text analysis methodologies and tracking statistics are discussed. Several approaches for incorporating misclassified data or error measurements into the inference for tracking statistics are proposed and evaluated. Finally, as an illustrative example, the proposed data-mining procedure is applied to analyzing an aviation safety report repository to show its utility in aviation risk management or general decision-support systems.
引用
收藏
页码:116 / 128
页数:13
相关论文
共 50 条
  • [1] Mining massive text data and developing tracking statistics
    Jeske, DR
    Liu, RY
    [J]. CLASSIFICATION, CLUSTERING, AND DATA MINING APPLICATIONS, 2004, : 495 - 510
  • [2] Mining Eye-Tracking Data for Text Summarization
    Taieb-Maimon, Meirav
    Romanovski-Chernik, Aleksandr
    Last, Mark
    Litvak, Marina
    Elhadad, Michael
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024, 40 (17) : 4887 - 4905
  • [3] Multidimensional Mining of Massive Text Data
    Zhang, Chao
    Han, Jiawei
    [J]. Synthesis Lectures on Data Mining and Knowledge Discovery, 2019, 11 (02): : 1 - 198
  • [4] A technology of text classification of data mining
    Yang, Bin
    Meng, Zhi-qing
    [J]. Xiangtan Daxue Ziran Kexue Xuebao, 2001, 23 (04): : 34 - 37
  • [5] Data Mining for Gaze Tracking System
    Heidenburg, Breanna
    Lenisa, Michael
    Wentzel, Daniel
    Malinowski, Aleksander
    [J]. 2008 CONFERENCE ON HUMAN SYSTEM INTERACTIONS, VOLS 1 AND 2, 2008, : 686 - 689
  • [6] JUSTIFICATION LOGIC, INFERENCE TRACKING, AND DATA PRIVACY
    Studer, Thomas
    [J]. LOGIC AND LOGICAL PHILOSOPHY, 2011, 20 (04) : 297 - 306
  • [7] Construction tracking: implications of logistics data
    Maxwell, Duncan
    Couper, Rachel
    [J]. CONSTRUCTION INNOVATION-ENGLAND, 2023, 23 (02): : 322 - 339
  • [8] Intelligent classification of construction quality problems based on unbalanced short text data mining
    Wang, Dan
    Yin, Kai
    Wang, Hailong
    [J]. AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (10)
  • [9] Data Tracking Under Competition
    Bimpikis, Kostas
    Morgenstern, Ilan
    Saban, Daniela
    [J]. OPERATIONS RESEARCH, 2024, 72 (02) : 514 - 532
  • [10] Attributes in tracking and classification with incomplete data
    Drummond, OE
    [J]. SIGNAL AND DATA PROCESSING OF SMALL TARGETS 2004, 2004, 5428 : 476 - 496