Mining significant associations in large scale text corpora

被引:2
|
作者
Raghavan, P
Tsaparas, P
机构
关键词
D O I
10.1109/ICDM.2002.1183933
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining large-scale text corpora is an essential step in extracting the key themes in a corpus. We motivate a quantitative measure for significant associations through the distributions of pairs and triplets of co-occurring words. We consider the algorithmic problem of efficiently enumerating such significant associations and present pruning algorithms for these problems, with theoretical as well as empirical analyses. Our algorithms make use of two novel mining methods: (1) matrix mining, and (2) shortened documents. We present evidence from a diverse set of documents that our measure does in fact elicit interesting co-occurrences.
引用
下载
收藏
页码:402 / 409
页数:8
相关论文
共 50 条
  • [41] Molecular profiling of thyroid cancer subtypes using large-scale text mining
    Chengkun Wu
    Jean-Marc Schwartz
    Georg Brabant
    Goran Nenadic
    BMC Medical Genomics, 7
  • [42] Graph clustering for large-scale text-mining of brain imaging studies
    Center for Cognitive Science, Indian Institute of Technology, Gandhinagar, Ahmedabad, India
    不详
    不详
    ACM Int. Conf. Proc. Ser., (163-168):
  • [43] RDBridge: a knowledge graph of rare diseases based on large-scale text mining
    Xing, Huadong
    Zhang, Dachuan
    Cai, Pengli
    Zhang, Rui
    Hu, Qian-Nan
    BIOINFORMATICS, 2023, 39 (07)
  • [44] Text analytics for supporting stakeholder opinion mining for large-scale highway projects
    Lv, Xuan
    El-Gohary, Nora
    ICSDEC 2016 - INTEGRATING DATA SCIENCE, CONSTRUCTION AND SUSTAINABILITY, 2016, 145 : 518 - 524
  • [45] Graph Clustering for Large-Scale Text-Mining of Brain Imaging Studies
    Chawla, Manisha
    Mesa, Mounika
    Miyapuram, Krishna P.
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 163 - 168
  • [46] Molecular profiling of thyroid cancer subtypes using large-scale text mining
    Wu, Chengkun
    Schwartz, Jean-Marc
    Brabant, Georg
    Nenadic, Goran
    BMC MEDICAL GENOMICS, 2014, 7
  • [47] Large scale text mining for deriving useful insights: A case study focused on microbiome
    Al Ahmed, Syed Ashif Jardary
    Bapatdhar, Nishad
    Kumar, Bipin Pradeep
    Ghosh, Samik
    Yachie, Ayako
    Palaniappan, Sucheendra K.
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [48] Mining Statistically Significant Attribute Associations in Attributed Graphs
    Lee, Jihwan
    Park, Keehwan
    Prabhakar, Sunil
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 991 - 996
  • [49] Mining quantitative associations in large database
    Hui, CY
    Wang, YJ
    Zhang, BY
    Yang, Q
    Wang, Q
    Zhou, JH
    He, R
    Yan, Y
    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 405 - 416
  • [50] Mining generalized character n-grams in large corpora
    Marques, NC
    Braud, A
    PROGRESS IN ARTIFICIAL INTELLIGENCE-B, 2003, 2902 : 419 - 423