TDDA, a data mining tool for text databases: A case history in a lung cancer text database

被引：0

作者：

Goldman, JA ^{[1
]}

Chu, W

Parker, DS

Goldman, RM

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

[2] Univ Osteopath Med & Hlth Sci, Des Moines, IA 50312 USA

来源：

DISCOVERY SCIENCE | 1998年 / 1532卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we give a case history illustrating the real world application of a useful technique for data mining in text databases. The technique, Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains, and announcing significant differences from standard frequency distributions over these domains as a hypothesis. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard chi(2) statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery.

引用

页码：431 / 432

页数：2

共 50 条

[21] Parallel mining of association rules from text databases
John D. Holt
Soon M. Chung
The Journal of Supercomputing, 2007, 39 : 273 - 299
[22] Mining multiple informational text structure from text data
Das, Syaamantak
Das Mandal, Shyamal Kumar
Basu, Anupam
INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2211 - 2220
[23] Text Mining Technique for Data Mining Application
Govindarajan, M.
PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 544 - 549
[24] Data mining method from text database based on fuzzy quantification analysis
Aoki, K
Watada, J
2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 6472 - 6478
[25] Data extraction in oral cancer pathology using text mining
Roy, Paromita
Mallick, Indranil
MODERN PATHOLOGY, 2019, 32
[26] Data extraction in oral cancer pathology using text mining
Roy, Paromita
Mallick, Indranil
LABORATORY INVESTIGATION, 2019, 99
[27] Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
Lever, Jake
Jones, Martin R.
Danos, Arpad M.
Krysiak, Kilannin
Bonakdar, Melika
Grewal, Jasleen K.
Culibrk, Luka
Griffith, Obi Lee
Griffith, Malachi
Jones, Steven J. M.
GENOME MEDICINE, 2019, 11 (01)
[28] Text2MARK: A text mining tool in the aid of knowledge representation
da Silva, Clay Palmeira
de Morais, Jefferson Magalhaes
Monteiro, Dionne Cavaleante
2013 13TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS (ISDA), 2013, : 236 - 241
[29] Text-mining clinically relevant cancer biomarkers for curation into the CIViC database
Jake Lever
Martin R. Jones
Arpad M. Danos
Kilannin Krysiak
Melika Bonakdar
Jasleen K. Grewal
Luka Culibrk
Obi L. Griffith
Malachi Griffith
Steven J. M. Jones
Genome Medicine, 11
[30] miRCancer: a microRNA-cancer association database constructed by text mining on literature
Xie, Boya
Ding, Qin
Han, Hongjin
Wu, Di
BIOINFORMATICS, 2013, 29 (05) : 638 - 644

← 1 2 3 4 5 →