Unstructured Data Extraction in Distributed NoSQL

被引:0
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Unstructured data; big data; Hidden Markov Model (HMM); terms extraction; NoSQL; Re-usable dictionary; Association rules;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
While "Big data" has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since today's data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we infroduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.
引用
收藏
页码:160 / 165
页数:6
相关论文
共 50 条
  • [41] Enhancing Data Security in Cloud using Random Pattern Fragmentation and a Distributed NoSQL Database
    Santos, Nelson L.
    Ghita, Bogdan
    Masala, Giovanni L.
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 3735 - 3740
  • [42] Key technologies of a distributed and unstructured water resources big data system
    Dong, Yuan
    Xiao, D.
    Hu, BaoQing
    Zhang, ShiLun
    Liang, JiaHai
    Nong, GuoCai
    Liu, ZhiXian
    Zhao, RongYang
    Liu, MeiXing
    Xu, ZhenHua
    Tao, Jin
    Deng, Kai
    Zhou, Li
    Han, Xin
    DESALINATION AND WATER TREATMENT, 2018, 122 : 36 - 41
  • [43] Distributed real-time ETL architecture for unstructured big data
    Erum Mehmood
    Tayyaba Anees
    Knowledge and Information Systems, 2022, 64 : 3419 - 3445
  • [44] Distributed real-time ETL architecture for unstructured big data
    Mehmood, Erum
    Anees, Tayyaba
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (12) : 3419 - 3445
  • [45] Towards an efficient parallel raycasting of unstructured volumetric data on distributed environments
    Bentes, Cristiana
    Labronici, Bernardo B.
    Drummond, Lucia M. A.
    Farias, Ricardo
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (02): : 423 - 439
  • [46] Towards an efficient parallel raycasting of unstructured volumetric data on distributed environments
    Cristiana Bentes
    Bernardo B. Labronici
    Lúcia M. A. Drummond
    Ricardo Farias
    Cluster Computing, 2014, 17 : 423 - 439
  • [47] NoSQL data management systems
    Kuznetsov, S. D.
    Poskonin, A. V.
    PROGRAMMING AND COMPUTER SOFTWARE, 2014, 40 (06) : 323 - 332
  • [48] RSenter: Tool for Topics and Terms Extraction from Unstructured Data Debris
    Lomotey, Richard K.
    Deters, Ralph
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 395 - 402
  • [49] An analytical study of information extraction from unstructured and multidimensional big data
    Adnan, Kiran
    Akbar, Rehan
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [50] Causal-Pdh: Causal Consistency Model for NoSQL Distributed Data Storage Using HashGraph
    Tian J.
    Wang Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2020, 57 (12): : 2703 - 2716