Quality assessment of collaboratively-created web content with no manual intervention based on soft multi-view generation

被引:2
|
作者
Goncalves Magalhaes, Luiz Felipe [1 ]
Goncalves, Marcos Andre [1 ]
Canuto, Sergio Daniel [1 ]
Dalip, Daniel H. [2 ]
Cristo, Marco [3 ]
Calado, Pavel [4 ]
机构
[1] Univ Fed Minas Gerais, Av Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil
[2] Ctr Fed Ensino Tecnol Minas Gerais, Belo Horizonte, MG, Brazil
[3] Univ Fed Amazonas, Manaus, Amazonas, Brazil
[4] Univ Lisbon, Inst Super Tecn, INESC ID, Porto Salvo, Portugal
关键词
Multi-view; Machine learning; Information retrieval; Automatic text quality assessment; ENSEMBLES;
D O I
10.1016/j.eswa.2019.04.053
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated quality assessment of collaboratively created Web content is important to guarantee scalability and lack of bias. The state-of-the-art solution for this problem relies on multi-view learning, where quality is considered a multifaceted concept that can be learned from human assessments. To this effect, features describing quality have been devised and grouped into views based on criteria such as text structure, readability, style, user edit history, etc. The tasks of determining the views and properly combining them require the assistance of an expert, which is hard to do in scenarios where they are overlapping or hard to interpret by humans. In this work we propose an automatic view generator, specially designed for the problem of automated content quality assessment with no manual intervention. Automatic view generation is achieved by finding clusters of highly correlated features. This process is performed iteratively, by automatically creating new clusters, evaluating them, and keeping those that perform the best. Experiments on three popular Wiki datasets show that our automated views are able to reduce the classification error of the original features by up to 20%. This happens by automatically generating views that are very similar to those manually built, while keeping only a small set of features to reduce noise and overfitting. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:226 / 238
页数:13
相关论文
共 4 条
  • [1] Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia
    Dalip, Daniel Hasan
    Goncalves, Marcos Andre
    Cristo, Marco
    Calado, Pavel
    [J]. JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, : 295 - 304
  • [2] Feature Sampling Based Unsupervised Semantic Clustering for Real Web Multi-View Content
    Gong, Xiaolong
    Huang, Linpeng
    Wang, Fuwei
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 102 - 109
  • [3] HEVC Image Quality Assessment of the Multi-view and Super-resolution Images Based on CNN
    Kawabata, Norifumi
    [J]. 2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 38 - 39
  • [4] Multi-View 3D CG Image Quality Assessment for Contrast Enhancement Based on S-CIELAB Color Space
    Kawabata, Norifumi
    Miyao, Masaru
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (07): : 1448 - 1462