TDDA, a data mining tool for text databases: A case history in a lung cancer text database

被引:0
|
作者
Goldman, JA [1 ]
Chu, W
Parker, DS
Goldman, RM
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[2] Univ Osteopath Med & Hlth Sci, Des Moines, IA 50312 USA
来源
DISCOVERY SCIENCE | 1998年 / 1532卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we give a case history illustrating the real world application of a useful technique for data mining in text databases. The technique, Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains, and announcing significant differences from standard frequency distributions over these domains as a hypothesis. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi(2) statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery.
引用
收藏
页码:431 / 432
页数:2
相关论文
共 50 条
  • [31] PubMeth: a cancer methylation database combining text-mining and expert annotation
    Ongenaert, Mate
    Van Neste, Leander
    De Meyer, Tim
    Menschaert, Gerben
    Bekaert, Sofie
    Van Criekinge, Wim
    NUCLEIC ACIDS RESEARCH, 2008, 36 : D842 - D846
  • [32] Text databases: One database model and several retrieval languages
    Ide, N
    COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 319 - 321
  • [33] Malware Detection by Text and Data Mining
    Sundarkumar, G. Ganesh
    Ravi, Vadlamani
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 566 - 571
  • [34] Data Analysis Support by Combining Data Mining and Text Mining
    Matsumoto, Tomoya
    Sunayama, Wataru
    Hatanaka, Yuji
    Ogohara, Kazunori
    2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 313 - 318
  • [35] Text Mining in Big Data Analytics
    Hassani, Hossein
    Beneki, Christina
    Unger, Stephan
    Mazinani, Maedeh Taj
    Yeganegi, Mohammad Reza
    BIG DATA AND COGNITIVE COMPUTING, 2020, 4 (01) : 1 - 34
  • [36] DATA PREPROCESSING IN WEB TEXT MINING
    Jiang Yongbo
    FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2012), 2012, : 573 - 581
  • [37] Text and Data Quality Mining in CRIS
    Azeroual, Otmane
    INFORMATION, 2019, 10 (12)
  • [38] Analyzing Text Data for Opinion Mining
    Wei, Wei
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 330 - 335
  • [39] Pattern and Cluster Mining on Text Data
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 428 - 432
  • [40] Text Mining in Big Data Analytics
    Cogburn, Derrick L.
    Hine, Michael J.
    Peladeau, Normand
    Yoon, Victoria Y.
    PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 584 - 586