Unstructured Data Extraction in Distributed NoSQL

被引:0
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Unstructured data; big data; Hidden Markov Model (HMM); terms extraction; NoSQL; Re-usable dictionary; Association rules;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
While "Big data" has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since today's data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we infroduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.
引用
收藏
页码:160 / 165
页数:6
相关论文
共 50 条
  • [21] A distributed event extraction framework for large-scale unstructured text
    Kan, Zhigang
    Mi, Haibo
    Yang, Sen
    Qiao, Linbo
    Feng, Dawei
    Li, Dongsheng
    2020 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2020), 2020, : 102 - 108
  • [22] Unstructured data extraction of Chinese expert web page
    Hong, Xudong
    Shen, Tao
    Shen, Longhua
    Yu, Zhengtao
    Guo, Jianyi
    International Journal of Wireless and Mobile Computing, 2014, 7 (02) : 132 - 136
  • [23] The Partition Heuristic Information Extraction Algorithm of Unstructured Data
    Li, Cong
    Zou, Chengming
    Zhong, Luo
    Zhu, Jinyang
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 570 - 576
  • [24] Extraction of Failure Graphs from Structured and Unstructured data
    Schierle, Martin
    Trabold, Daniel
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 324 - 330
  • [25] An ontology-based approach to designing a NoSQL database for semi-structured and unstructured health data
    Sen, Poly Sil
    Mukherjee, Nandini
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (01): : 959 - 976
  • [26] An ontology-based approach to designing a NoSQL database for semi-structured and unstructured health data
    Poly Sil Sen
    Nandini Mukherjee
    Cluster Computing, 2024, 27 : 959 - 976
  • [27] Challenges in NoSQL-Based Distributed Data Storage: A Systematic Literature Review
    Ramzan, Shabana
    Bajwa, Imran Sarwar
    Kazmi, Rafaqut
    Amna
    ELECTRONICS, 2019, 8 (05)
  • [28] Enhanced Elearning Application for Data Mining in a NoSQL Distributed Database Management System
    Valentin, Pupezescu
    Mailena-Catalina, Dragomir
    PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON VIRTUAL LEARNING, ICVL 2019, 2019, : 476 - 482
  • [29] Creation of a Distributed NoSQL Database with Distributed Hash Tables
    San Roman Guzman, Agustin
    Valdeolmillos, Diego
    Rivas, Alberto
    Gonzalez Arrieta, Angelica
    Chamoso, Pablo
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 26 - 37
  • [30] NoSQL Distributed Big Data Storage Technology and Application Based on Cloud Platform
    Lu Zheng-Wu
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON ADVANCED DESIGN AND MANUFACTURING ENGINEERING (ICADME 2017), 2017, 136 : 334 - 340