Data Mining from NoSQL Document-Append Style Storages

被引:1
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Data mining; NoSQL; Bayesian Rule; Unstructured data; Apriori; Big Data;
D O I
10.1109/ICWS.2014.62
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The modern data economy, which has been described as "Big Data", has changed the status quo on digital content creation and storage. While data storage has followed the schema-dictated approach for decades, the recent nature of digital content, which is widely unstructured, creates the need to adopt different storage techniques. Thus, the NoSQL database systems have been proposed to accommodate most of the content being generated today. One of such NoSQL databases that have received significant enterprise adoption is the document-append style storage. The emerging concern and challenge however is that, research and tools that can aid data mining processes from such NoSQL databases is generally lacking. Even though document-append style storages allow data accessibility as Web services and over URL/I, building a corresponding data mining tool deviates from the underlying techniques governing web crawlers. Also, existing data mining tools that have been designed for schema-based storages (e.g., RDBMS) are misfits. Hence, our goal in this work is to design a unique data analytics tool that enables knowledge discovery through information retrieval from document-append style storage. The tool is algorithmically built on the inference-based Apriori, which aids us to achieve optimization of the search duration. Preliminary test results of the proposed tool also show high accuracy in comparison to other approaches that were previously proposed.
引用
收藏
页码:385 / 392
页数:8
相关论文
共 50 条
  • [41] Big Data Retrieval Using Locality-Sensitive Hashing with Document-Based NoSQL Database
    Gayathiri, N. R.
    Natarajan, A. M.
    IETE JOURNAL OF RESEARCH, 2021, 67 (06) : 969 - 978
  • [42] Palpatine: Mining Frequent Sequences for Data Prefetching in NoSQL Distributed Key-Value Stores
    Estevest, Sergio
    Silva, Joao Nuno
    Veiga, Luis
    2020 IEEE 19TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA), 2020,
  • [43] Migration from an SQL to a hybrid SQL/NoSQL data model
    Sokolova, Marina, V
    Gomez, Francisco J.
    Borisoglebskaya, Larisa N.
    JOURNAL OF MANAGEMENT ANALYTICS, 2020, 7 (01) : 1 - 11
  • [44] NOSOLAP: Moving from Data Warehouse Requirements to NoSQL Databases
    Prakash, Deepika
    PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING (ENASE), 2019, : 452 - 458
  • [45] DLToDW: Transferring Relational and NoSQL Databases from a Data Lake
    Jemmali R.
    Abdelhedi F.
    Zurfluh G.
    SN Computer Science, 3 (5)
  • [46] Journey of Database Migration from RDBMS to NoSQL Data Stores
    Bansal, Neha
    Soni, Kanika
    Sachdeva, Shelly
    BIG-DATA-ANALYTICS IN ASTRONOMY, SCIENCE, AND ENGINEERING, BDA 2021, 2022, 13167 : 159 - 177
  • [47] Data Mining Perspective: Prognosis of Life Style on Hypertension and Diabetes
    Aljumah, Abdullah
    Siddiqui, Mohammad
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (01) : 93 - 99
  • [48] Automatic Parallel Data Mining After Bilingual Document Alignment
    Wolk, Krzysztof
    Wolk, Agnieszka
    RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2017, 569 : 317 - 327
  • [49] Probabilistic network approach for parallel data mining and document categorization
    Wong, J
    Young, GH
    Kan, WK
    Sum, J
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 1163 - 1168
  • [50] Building OLAP Cubes from Columnar NoSQL Data Warehouses
    Dehdouh, Khaled
    MODEL AND DATA ENGINEERING, 2016, 9893 : 166 - 179