TDDA, a data mining tool for text databases: A case history in a lung cancer text database

被引：0

作者：

Goldman, JA ^{[1
]}

Chu, W

Parker, DS

Goldman, RM

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

[2] Univ Osteopath Med & Hlth Sci, Des Moines, IA 50312 USA

来源：

DISCOVERY SCIENCE | 1998年 / 1532卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we give a case history illustrating the real world application of a useful technique for data mining in text databases. The technique, Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains, and announcing significant differences from standard frequency distributions over these domains as a hypothesis. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi(2) statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery.

引用

页码：431 / 432

页数：2

共 50 条

[1] Term domain distribution analysis: A data mining tool for text databases
Goldman, JA
Chu, WW
Parker, DS
Goldman, RM
METHODS OF INFORMATION IN MEDICINE, 1999, 38 (02) : 96 - 101
[2] Data mining method from text databases
Kawano, M
Watada, J
Kawaura, T
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2005, 3683 : 1122 - 1128
[3] Dual Scaling in Data Mining from Text Databases
Watada, Junzo
Aoki, Keisuke
Kawano, Masahiro
Hitam, Muhammad Suzuri
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2006, 10 (04) : 453 - 459
[4] Data mining of text as a tool in authorship attribution
Visa, A
Toivonen, J
Autio, S
Mäkinen, J
Back, B
Vanharanta, H
DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY III, 2001, 4384 : 149 - 156
[5] Text data mining: A case study
Ford, CW
Chiang, CC
Wu, H
Chilka, RR
Talburt, JR
ITCC 2005: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, VOL 1, 2005, : 122 - 127
[6] IBminer: A Text Mining Tool for Constructing and Populating InfoBox Databases and Knowledge Bases
Mousavi, Hamid
Gao, Shi
Zaniolo, Carlo
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12): : 1330 - 1333
[7] Data mining on text
Clifton, C
Steinheiser, R
TWENTY-SECOND ANNUAL INTERNATIONAL COMPUTER SOFTWARE & APPLICATIONS CONFERENCE - PROCEEDINGS, 1998, : 630 - 635
[8] Text mining: powering the database revolution
Udo Hahn
Joachim Wermter
Rainer Blasczyk
Peter A. Horn
Nature, 2007, 448 : 130 - 130
[9] Text mining: powering the database revolution
Hahn, Udo
Wermter, Joachim
Blasczyk, Rainer
Horn, Peter A.
NATURE, 2007, 448 (7150) : 130 - 130
[10] Efficient mining of association rules in text databases
Holt, JD
Chung, SM
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 234 - 242

← 1 2 3 4 5 →