TDDA, a data mining tool for text databases: A case history in a lung cancer text database

被引:0
|
作者
Goldman, JA [1 ]
Chu, W
Parker, DS
Goldman, RM
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
[2] Univ Osteopath Med & Hlth Sci, Des Moines, IA 50312 USA
来源
DISCOVERY SCIENCE | 1998年 / 1532卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we give a case history illustrating the real world application of a useful technique for data mining in text databases. The technique, Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains, and announcing significant differences from standard frequency distributions over these domains as a hypothesis. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi(2) statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery.
引用
收藏
页码:431 / 432
页数:2
相关论文
共 50 条
  • [21] Parallel mining of association rules from text databases
    John D. Holt
    Soon M. Chung
    The Journal of Supercomputing, 2007, 39 : 273 - 299
  • [22] Mining multiple informational text structure from text data
    Das, Syaamantak
    Das Mandal, Shyamal Kumar
    Basu, Anupam
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2211 - 2220
  • [23] Text Mining Technique for Data Mining Application
    Govindarajan, M.
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 544 - 549
  • [24] Data mining method from text database based on fuzzy quantification analysis
    Aoki, K
    Watada, J
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6472 - 6478
  • [25] Data extraction in oral cancer pathology using text mining
    Roy, Paromita
    Mallick, Indranil
    MODERN PATHOLOGY, 2019, 32
  • [26] Data extraction in oral cancer pathology using text mining
    Roy, Paromita
    Mallick, Indranil
    LABORATORY INVESTIGATION, 2019, 99
  • [27] Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
    Lever, Jake
    Jones, Martin R.
    Danos, Arpad M.
    Krysiak, Kilannin
    Bonakdar, Melika
    Grewal, Jasleen K.
    Culibrk, Luka
    Griffith, Obi Lee
    Griffith, Malachi
    Jones, Steven J. M.
    GENOME MEDICINE, 2019, 11 (01)
  • [28] Text2MARK: A text mining tool in the aid of knowledge representation
    da Silva, Clay Palmeira
    de Morais, Jefferson Magalhaes
    Monteiro, Dionne Cavaleante
    2013 13TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2013, : 236 - 241
  • [29] Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
    Jake Lever
    Martin R. Jones
    Arpad M. Danos
    Kilannin Krysiak
    Melika Bonakdar
    Jasleen K. Grewal
    Luka Culibrk
    Obi L. Griffith
    Malachi Griffith
    Steven J. M. Jones
    Genome Medicine, 11
  • [30] miRCancer: a microRNA-cancer association database constructed by text mining on literature
    Xie, Boya
    Ding, Qin
    Han, Hongjin
    Wu, Di
    BIOINFORMATICS, 2013, 29 (05) : 638 - 644