OLAP on Multidimensional Text Databases: Topic Network Cube and its Applications

被引:0
|
作者
Zhang, Zhiyuan [1 ]
Wang, Hong [1 ]
Feng, Xingjie [1 ]
机构
[1] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
multidimensional text database; topic network cube; OLAP; text mining; complex network;
D O I
10.2298/FIL1805973Z
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multidimensional text data contains both structured attributes and unstructured text. Unlike the traditional numerical data, it is not straightforward to apply online analytical processing on multidimensional text data. Although some OLAP methods such as topic cube have been proposed in order to effectively utilize its structured information and valuable text data, these methods cant tell the relations of topic words. Considering that topics usually consist of several subtopics and each subtopic usually contains some topic words, we here use a topic network manner, in which related topic words are connected, to express the complex relations of topics. This paper introduces a new concept of topic network cube to perform OLAP analysis on multidimensional text databases. Firstly, we propose a method called GL-LDA based on Gibbs sampling outputs of Labeled LDA to measure the relations between topic words. Secondly, we give a storage model of topic network cube which can efficiently generate topic network using GL-LDA. Thirdly, we show how to perform OLAP analysis on topic network cube. Experimental results show that we can analyze multidimensional text databases in different granularity easily and effectively using just a few simple SQL statements, and the output network provides rich and useful information of topics.
引用
收藏
页码:1973 / 1982
页数:10
相关论文
共 50 条
  • [41] A Network Decomposition-based Text Clustering Algorithm for Topic Detection
    Meng, Zuqiang
    Shen, Shimo
    Chen, Qiulian
    MEASUREMENT TECHNOLOGY AND ITS APPLICATION, PTS 1 AND 2, 2013, 239-240 : 1318 - 1323
  • [42] Using network science and text analytics to produce surveys in a scientific topic
    Silva, Filipi N.
    Amancio, Diego R.
    Bardosova, Maria
    Costa, Luciano da F.
    Oliveira, Osvaldo N., Jr.
    JOURNAL OF INFORMETRICS, 2016, 10 (02) : 487 - 502
  • [43] P&D Graph Cube: Model and Parallel Materialization for Multidimensional Heterogeneous Network
    Wu, Xinyu
    Wu, Bin
    Wang, Bai
    2017 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2017, : 95 - 104
  • [44] Generation of text search applications for databases. An exercise on domain engineering
    Alonso, O
    SOFTWARE REUSE: METHODS, TECHNIQUES, AND TOOLS, PROCEEDINGS, 2002, 2319 : 179 - 193
  • [45] Combining biological databases and text mining to support new bioinformatics applications
    Witte, R
    Baker, CJO
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 310 - 321
  • [46] Neural network applications for automatic new topic identification
    Özmutlu, S
    Çavdur, F
    ONLINE INFORMATION REVIEW, 2005, 29 (01) : 34 - 53
  • [47] Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies
    Soumen Chakrabarti
    Byron Dom
    Rakesh Agrawal
    Prabhakar Raghavan
    The VLDB Journal, 1998, 7 : 163 - 178
  • [48] Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies
    Chakrabarti, S
    Dom, B
    Agrawal, R
    Raghavan, P
    VLDB JOURNAL, 1998, 7 (03): : 163 - 178
  • [49] Biomedical Text Mining and Its Applications
    Rodriguez-Esteban, Raul
    PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)
  • [50] A survey on text classification and its applications
    Zhou, Xujuan
    Gururajan, Raj
    Li, Yuefeng
    Venkataraman, Revathi
    Tao, Xiaohui
    Bargshady, Ghazal
    Barua, Prabal D.
    Kondalsamy-Chennakesavan, Srinivas
    WEB INTELLIGENCE, 2020, 18 (03) : 205 - 216