Terms Mining in Document-Based NoSQL: Response to Unstructured Data

被引:3
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Unstructured Data Mining; Big Bata; Viterbi algorithm; Terms; NoSQL; Association Rules; classification; clustering;
D O I
10.1109/BigData.Congress.2014.99
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.
引用
收藏
页码:661 / 668
页数:8
相关论文
共 50 条
  • [41] Managing cognitive load during document-based learning
    Rouet, Jean-Francois
    LEARNING AND INSTRUCTION, 2009, 19 (05) : 445 - 450
  • [42] Document-based workflow modeling: a case-based reasoning approach
    Kim, J
    Suh, W
    Lee, H
    EXPERT SYSTEMS WITH APPLICATIONS, 2002, 23 (02) : 77 - 93
  • [43] Comparison Between Document-based, Term-based and Hybrid Partitioning
    Abusukhon, Ahmad
    Oakes, Michael P.
    Talib, Mohammad
    Abdalla, Ayman M.
    2008 FIRST INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES, VOLS 1 AND 2, 2008, : 97 - +
  • [44] Convolutional Deep Neural Networks for Document-Based Question Answering
    Fu, Jian
    Qiu, Xipeng
    Huang, Xuanjing
    NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 790 - 797
  • [45] DocMIR: An automatic document-based indexing system for meeting retrieval
    Behera, Ardhendu
    Lalanne, Denis
    Ingold, Rolf
    MULTIMEDIA TOOLS AND APPLICATIONS, 2008, 37 (02) : 135 - 167
  • [46] Integrated Document-based Electronic Health Records Persistence Framework
    Gamal, Aya
    Barakat, Sherif
    Rezk, Amira
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (10) : 147 - 155
  • [47] Hospitexte: Towards a document-based Hypertextual Electronic Medical Record
    Charlet, J
    Bachimont, B
    Brunie, V
    El Kassar, S
    Zweigenbaum, P
    Boisvieux, JF
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, : 713 - 717
  • [48] An ontology-based approach to designing a NoSQL database for semi-structured and unstructured health data
    Sen, Poly Sil
    Mukherjee, Nandini
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (01): : 959 - 976
  • [49] An ontology-based approach to designing a NoSQL database for semi-structured and unstructured health data
    Poly Sil Sen
    Nandini Mukherjee
    Cluster Computing, 2024, 27 : 959 - 976
  • [50] A data replication strategy for document-oriented NoSQL systems
    Tabet, Khaoula
    Mokadem, Riad
    Laouar, Mohamed Ridda
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2019, 10 (01) : 53 - 62