Terms Mining in Document-Based NoSQL: Response to Unstructured Data

被引:3
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Unstructured Data Mining; Big Bata; Viterbi algorithm; Terms; NoSQL; Association Rules; classification; clustering;
D O I
10.1109/BigData.Congress.2014.99
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.
引用
收藏
页码:661 / 668
页数:8
相关论文
共 50 条
  • [31] OnTheFly: a tool for automated document-based text annotation, data linking and network generation
    Pavlopoulos, Georgios A.
    Pafilis, Evangelos
    Kuhn, M.
    Hooper, Sean D.
    Schneider, Reinhard
    BIOINFORMATICS, 2009, 25 (07) : 977 - 978
  • [32] The Credibility of Public and Private Signals: A Document-Based Approach
    Katagiri, Azusa
    Min, Eric
    AMERICAN POLITICAL SCIENCE REVIEW, 2019, 113 (01) : 156 - 172
  • [33] Exploiting Document-Based Features for Clarification in Conversational Search
    Sekulic, Ivan
    Aliannejadi, Mohammad
    Crestani, Fabio
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 413 - 427
  • [34] Developing and Implementing an Interoperable Document-based Electronic Health Record
    Campos, Fernando
    Plazzotta, Fernando
    Luna, Daniel
    Baum, Analia
    Bernaldo de Quiros, Fernan Gonzalez
    MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 1169 - 1169
  • [35] Document-oriented Models for Data Warehouses NoSQL Document-oriented for Data Warehouses
    Chevalier, Max
    El Malki, Mohammed
    Kopliku, Arlind
    Teste, Olivier
    Tournier, Ronan
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 142 - 149
  • [36] Combined Document/Business Card Detector for Proactive Document-Based Services on the Smartphone
    Kim, Yong-Joong
    Kim, Yonghyun
    Kang, Bong-Nam
    Kim, Daijin
    NEURAL INFORMATION PROCESSING, ICONIP 2015, PT IV, 2015, 9492 : 393 - 402
  • [37] Warehousing structured and unstructured data for data mining
    Miller, LL
    Honavar, V
    Barta, T
    ASIS '97 - PROCEEDINGS OF THE 60TH ASIS ANNUAL MEETING, VOL 34 1997, 1997, 34 : 215 - 224
  • [38] Warehousing structured and unstructured data for data mining
    Miller, LL
    Honavar, V
    Barta, T
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1997, 34 : 215 - 224
  • [39] A Document-Based Approach to Monitor Business Process Instances
    AbuSafiya, Majed
    Mazumdar, Subhasish
    ENTERPRISE MODELLING AND INFORMATION SYSTEMS ARCHITECTURES-AN INTERNATIONAL JOURNAL, 2008, 3 (02): : 54 - 64
  • [40] Con Job: An Estimate of Ex-Felon Voter Turnout Using Document-Based Data
    Haselswerdt, Michael V.
    SOCIAL SCIENCE QUARTERLY, 2009, 90 (02) : 262 - 273