Unstructured Data Extraction in Distributed NoSQL

被引:0
|
作者
Lomotey, Richard K. [1 ]
Deters, Ralph [1 ]
机构
[1] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK S7N 0W0, Canada
关键词
Unstructured data; big data; Hidden Markov Model (HMM); terms extraction; NoSQL; Re-usable dictionary; Association rules;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
While "Big data" has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since today's data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we infroduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.
引用
收藏
页码:160 / 165
页数:6
相关论文
共 50 条
  • [31] MyStore: A High Available Distributed Storage System for Unstructured Data
    Jiang, Wenbin
    Zhang, Lei
    Qiang, Weizhong
    Jin, Hai
    Peng, Yaqiong
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 233 - 240
  • [32] A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment
    Thavavel, V.
    Sivakumar, S.
    International Journal of Computer Science Issues, 2012, 9 (1 1-2): : 434 - 441
  • [33] DISTRIBUTED DATABASE OPTIMIZATIONS WITH NoSQL MEMBERS
    Popa, George Dan
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2015, 77 (02): : 55 - 64
  • [34] DISTRIBUTED DATABASE OPTIMIZATIONS with NOSQL MEMBERS
    Popa, George Dan
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2015, 77 77 (2 2): : 55 - 64
  • [35] Application of Modified Genetic Algorithm in Feature extraction of the Unstructured Data
    Du, Nan
    Peng, Hong
    Zhang, Wenfeng
    INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL : ICACC 2009 - PROCEEDINGS, 2009, : 124 - 128
  • [36] Automatic information extraction from unstructured mammography reports using distributed semantics
    Gupta, Anupama
    Banerjee, Imon
    Rubin, Daniel L.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 78 : 78 - 86
  • [37] Extraction and Multidimensional Analysis of Data from Unstructured Data Sources: A Case Study
    Lima, Rui
    Cruz, Estrela Ferreira
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS), VOL 1, 2019, : 190 - 199
  • [38] Automatic extraction of numerical values from unstructured data in EHRs
    Bigeard, Elise
    Jouhet, Vianney
    Mougin, Fleur
    Thiessard, Frantz
    Grabar, Natalia
    DIGITAL HEALTHCARE EMPOWERING EUROPEANS, 2015, 210 : 50 - 54
  • [39] NoSQL distributed database for DICOM objects
    Almeida, Ana
    Oliveira, Francisco
    Lebre, Rui
    Costa, Carlos
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1882 - 1885
  • [40] Design of a New Distributed NoSQL Database with Distributed Hash Tables
    San Roman Guzman, Agustin
    Valdeolmillos, Diego
    Rivas, Alberto
    Gonzalez Arrieta, Angelica
    Chamoso, Pablo
    LOGIC JOURNAL OF THE IGPL, 2022, 30 (04) : 566 - 577