TDDA, a data mining tool for text databases: A case history in a lung cancer text database

被引:0
|
作者
Goldman, JA [1 ]
Chu, W
Parker, DS
Goldman, RM
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[2] Univ Osteopath Med & Hlth Sci, Des Moines, IA 50312 USA
来源
DISCOVERY SCIENCE | 1998年 / 1532卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we give a case history illustrating the real world application of a useful technique for data mining in text databases. The technique, Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains, and announcing significant differences from standard frequency distributions over these domains as a hypothesis. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi(2) statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery.
引用
收藏
页码:431 / 432
页数:2
相关论文
共 50 条
  • [1] Term domain distribution analysis: A data mining tool for text databases
    Goldman, JA
    Chu, WW
    Parker, DS
    Goldman, RM
    METHODS OF INFORMATION IN MEDICINE, 1999, 38 (02) : 96 - 101
  • [2] Data mining method from text databases
    Kawano, M
    Watada, J
    Kawaura, T
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1122 - 1128
  • [3] Dual Scaling in Data Mining from Text Databases
    Watada, Junzo
    Aoki, Keisuke
    Kawano, Masahiro
    Hitam, Muhammad Suzuri
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2006, 10 (04) : 453 - 459
  • [4] Data mining of text as a tool in authorship attribution
    Visa, A
    Toivonen, J
    Autio, S
    Mäkinen, J
    Back, B
    Vanharanta, H
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY III, 2001, 4384 : 149 - 156
  • [5] Text data mining: A case study
    Ford, CW
    Chiang, CC
    Wu, H
    Chilka, RR
    Talburt, JR
    ITCC 2005: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, 2005, : 122 - 127
  • [6] IBminer: A Text Mining Tool for Constructing and Populating InfoBox Databases and Knowledge Bases
    Mousavi, Hamid
    Gao, Shi
    Zaniolo, Carlo
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12): : 1330 - 1333
  • [7] Data mining on text
    Clifton, C
    Steinheiser, R
    TWENTY-SECOND ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE - PROCEEDINGS, 1998, : 630 - 635
  • [8] Text mining: powering the database revolution
    Udo Hahn
    Joachim Wermter
    Rainer Blasczyk
    Peter A. Horn
    Nature, 2007, 448 : 130 - 130
  • [9] Text mining: powering the database revolution
    Hahn, Udo
    Wermter, Joachim
    Blasczyk, Rainer
    Horn, Peter A.
    NATURE, 2007, 448 (7150) : 130 - 130
  • [10] Efficient mining of association rules in text databases
    Holt, JD
    Chung, SM
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 234 - 242