Multidimensional analysis model for a document warehouse that includes textual measures

被引:6
|
作者
Mendoza, Martha [1 ,2 ]
Alegria, Erwin [1 ]
Maca, Manuel [1 ]
Cobos, Carlos [1 ,2 ]
Leon, Elizabeth [3 ]
机构
[1] Univ Cauca, Informat Technol Res Grp GTI, Popayan, Colombia
[2] Univ Cauca, Elect & Telecommun Engn Fac, Popayan, Colombia
[3] Univ Nacl Colombia, Fac Engn, Medellin, Antioquia, Colombia
关键词
Document warehouse; OLAP; Textual measures; Text warehouse; ALGORITHM;
D O I
10.1016/j.dss.2015.02.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data warehouses and On-Line Analytical Processing tools, OLAP, together permit a multi-dimensional analysis of structured data information. However, as business systems are increasingly required to handle substantial quantities of unstructured textual information, the need arises for an effective and similar means of analysis. To manage unstructured text data stored in data warehouses, a new multi-dimensional analysis model is proposed that includes textual measures as well as a topic hierarchy. In this model, the textual measures that associate the topics with the text documents are generated by Probabilistic Latent Semantic Analysis, while the hierarchy is created automatically using a clustering algorithm. Documents are then able to be queried using OLAP tools. The model was evaluated from two viewpoints query execution time and user satisfaction. Evaluation of execution time was carried out on scientific articles using two query types and user satisfaction (with query time and ease of use) using statistical frequency and multivariate analyses. Encouraging observations included that as the number of documents increases, query time increases as a lineal, rather than exponential tendency. In addition, the model gained an increasing acceptance with use, while the visualization of the model was also well received by users. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:44 / 59
页数:16
相关论文
共 50 条
  • [21] Empirical validation of metrics for object oriented multidimensional model for data warehouse
    Gosain A.
    Mann S.
    [J]. International Journal of System Assurance Engineering and Management, 2014, 5 (3) : 262 - 275
  • [22] Building a Semantic Model of a Textual Document for Efficient Search and Retrieval
    Nyamsuren, Enkhbold
    Choi, Ho-Jin
    [J]. 11TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III, PROCEEDINGS,: UBIQUITOUS ICT CONVERGENCE MAKES LIFE BETTER!, 2009, : 298 - 302
  • [23] Diamond multidimensional model and aggregation operators for document OLAP
    Azabou, Maha
    Khrouf, Kais
    Feki, Jamel
    Soule-Dupuy, Chantal
    Valles, Nathalie
    [J]. 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2015, : 363 - 373
  • [24] Cloud-based textual analysis as a basis for document classification
    Weir, George R. S.
    Owoeye, Kolade
    Oberacker, Alice
    Alshahrani, Haya
    [J]. PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 672 - 676
  • [25] Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques
    Sabharwal S.
    Nagpal S.
    Aggarwal G.
    [J]. International Journal of System Assurance Engineering and Management, 2017, 8 (Suppl 2) : 703 - 715
  • [26] Theoretical Validation of Object-Oriented Metrics for Data Warehouse Multidimensional Model
    Gosain, Anjana
    Gupta, Rakhi
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, FICTA 2016, VOL 1, 2017, 515 : 681 - 691
  • [27] Comparative Analysis of Similarity Measures in Document Clustering
    Karun, Kavitha A.
    Philip, Mintu
    Lubna, K.
    [J]. 2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 857 - 860
  • [28] Empirical Investigation of Metrics for Multidimensional Model of Data Warehouse Using Support Vector Machine
    Sabharwal, Sangeeta
    Nagpal, Sushama
    Aggarwal, Gargi
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,
  • [29] USING A TREE MODEL IN TEXTUAL ANALYSIS
    LUONG, NX
    [J]. COMPUTERS AND THE HUMANITIES, 1989, 23 (4-5): : 397 - 402
  • [30] A Proposed Textual Graph Based Model for Arabic Multi-document Summarization
    Alwan, Muneer A.
    Onsi, Hoda M.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 435 - 439