Quality Evaluation for Documental Big Data

被引:0
|
作者
Fugini, Mariagrazia [1 ]
Finocchi, Jacopo [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Piazza L Da Vinci 32, Milan, Italy
关键词
Text Analytics; Big Data Analytics; Enterprise Content Management; Document Management; Machine Learning for Document Processing; DATA ANALYTICS;
D O I
10.5220/0009394301320139
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents the analysis of quality regarding a textual Big Data Analytics approach developed within a Project dealing with a platform for Big Data shared among three companies. In particular, the paper focuses on documental Big Data. In the context of the Project, the work presented here deals with extraction of knowledge from document and process data in a Big Data environment, and focuses on the quality of processed data. Performance indexes, like correctness, precision, and efficiency parameters are used to evaluate the quality of the extraction and classification process. The novelty of the approach is that no document types are predefined but rather, after manual processing of new types, datasets are continuously set up as training sets to be processed by a Machine Learning step that learns the new documents types. The paper presents the document management architecture and discusses the main results.
引用
收藏
页码:132 / 139
页数:8
相关论文
共 50 条
  • [1] Big Data Quality: A Quality Dimensions Evaluation
    Taleb, Ikbal
    El Kassabi, Hadeel T.
    Serhani, Mohamed Adel
    Dssouli, Rachida
    Bouhaddioui, Chafik
    [J]. 2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 759 - 765
  • [2] Laboratory Data Quality Evaluation in the Big Data Era
    Kim, Sollip
    [J]. ANNALS OF LABORATORY MEDICINE, 2023, 43 (05) : 399 - 400
  • [3] Documental analysis related to big data and its impact in human rights
    Tellez Carvajal, Evelyn
    [J]. DERECHO PUCP, 2020, (84) : 155 - 188
  • [4] Research on Comprehensive Evaluation of Data Source Quality in Big Data Environment
    Li, Wenquan
    Xu, Suping
    Peng, Xindong
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 1831 - 1841
  • [5] Data and Process Quality Evaluation in a Textual Big Data Archiving System
    Fugini, Mariagrazia
    Finocchi, Jacopo
    [J]. ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2022, 15 (01):
  • [6] BIG DATA, BIG DATA QUALITY PROBLEM
    Becker, David
    McMullen, Bill
    King, Trish Dunn
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2644 - 2653
  • [7] Learning Quality Evaluation of MOOC Based on Big Data Analysis
    Zhao, Zihao
    Wu, Qiangqiang
    Chen, Haopeng
    Wan, Chengcheng
    [J]. SMART COMPUTING AND COMMUNICATION, SMARTCOM 2016, 2017, 10135 : 277 - 286
  • [8] Research on Product Quality Evaluation Based on Big Data Analysis
    Song, Huaming
    Cao, Zhexiu
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 178 - 182
  • [9] From Data Quality to Big Data Quality
    Batini, Carlo
    Rula, Anisa
    Scannapieco, Monica
    Viscusi, Gianluigi
    [J]. JOURNAL OF DATABASE MANAGEMENT, 2015, 26 (01) : 60 - 82
  • [10] MEDICAL BIG DATA AND BIG DATA QUALITY PROBLEMS
    Hoffman, Sharona
    [J]. CONNECTICUT INSURANCE LAW JOURNAL, 2014, 21 (01): : 289 - 316