TDDA, a data mining tool for text databases: A case history in a lung cancer text database

被引：0

作者：

Goldman, JA ^{[1
]}

Chu, W

Parker, DS

Goldman, RM

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

[2] Univ Osteopath Med & Hlth Sci, Des Moines, IA 50312 USA

来源：

DISCOVERY SCIENCE | 1998年 / 1532卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we give a case history illustrating the real world application of a useful technique for data mining in text databases. The technique, Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains, and announcing significant differences from standard frequency distributions over these domains as a hypothesis. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi(2) statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery.

引用

页码：431 / 432

页数：2

共 50 条

[31] PubMeth: a cancer methylation database combining text-mining and expert annotation
Ongenaert, Mate
Van Neste, Leander
De Meyer, Tim
Menschaert, Gerben
Bekaert, Sofie
Van Criekinge, Wim
NUCLEIC ACIDS RESEARCH, 2008, 36 : D842 - D846
[32] Text databases: One database model and several retrieval languages
Ide, N
COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 319 - 321
[33] Malware Detection by Text and Data Mining
Sundarkumar, G. Ganesh
Ravi, Vadlamani
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 566 - 571
[34] Data Analysis Support by Combining Data Mining and Text Mining
Matsumoto, Tomoya
Sunayama, Wataru
Hatanaka, Yuji
Ogohara, Kazunori
2017 6TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI), 2017, : 313 - 318
[35] Text Mining in Big Data Analytics
Hassani, Hossein
Beneki, Christina
Unger, Stephan
Mazinani, Maedeh Taj
Yeganegi, Mohammad Reza
BIG DATA AND COGNITIVE COMPUTING, 2020, 4 (01) : 1 - 34
[36] DATA PREPROCESSING IN WEB TEXT MINING
Jiang Yongbo
FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2012), 2012, : 573 - 581
[37] Text and Data Quality Mining in CRIS
Azeroual, Otmane
INFORMATION, 2019, 10 (12)
[38] Analyzing Text Data for Opinion Mining
Wei, Wei
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 330 - 335
[39] Pattern and Cluster Mining on Text Data
Agnihotri, Deepak
Verma, Kesari
Tripathi, Priyanka
2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 428 - 432
[40] Text Mining in Big Data Analytics
Cogburn, Derrick L.
Hine, Michael J.
Peladeau, Normand
Yoon, Victoria Y.
PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 584 - 586

← 1 2 3 4 5 →