On Text-based Mining with Active Learning and Background Knowledge Using SVM

被引:0
|
作者
Catarina Silva
Bernardete Ribeiro
机构
[1] Informática – Universidade de Coimbra,CISUC – Departamento de Engenharia
[2] Instituto Politécnico de Leiria,Escola Superior de Tecnologia e Gestão
来源
Soft Computing | 2007年 / 11卷
关键词
Text mining; Partially labeled data; Support vector machines;
D O I
暂无
中图分类号
学科分类号
摘要
Text mining, intelligent text analysis, text data mining and knowledge-discovery in text are generally used aliases to the process of extracting relevant and non-trivial information from text. Some crucial issues arise when trying to solve this problem, such as document representation and deficit of labeled data. This paper addresses these problems by introducing information from unlabeled documents in the training set, using the support vector machine (SVM) separating margin as the differentiating factor. Besides studying the influence of several pre-processing methods and concluding on their relative significance, we also evaluate the benefits of introducing background knowledge in a SVM text classifier. We further evaluate the possibility of actively learning and propose a method for successfully combining background knowledge and active learning. Experimental results show that the proposed techniques, when used alone or combined, present a considerable improvement in classification performance, even when small labeled training sets are available.
引用
收藏
页码:519 / 530
页数:11
相关论文
共 50 条
  • [31] Active Learning for Text Mining from Crowds
    Shao, Hao
    ADVANCES IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE (IEA/AIE 2017), PT II, 2017, 10351 : 409 - 418
  • [32] Designing a Text-based CAPTCHA Breaker and Solver by using Deep Learning Techniques
    UmaMaheswari, P.
    Ezhilarasi, S.
    Harish, Prithvi
    Gowrishankar, Balachandar
    Sanjiv, S.
    PROCEEDINGS OF 2020 IEEE INTERNATIONAL CONFERENCE ON ADVANCES AND DEVELOPMENTS IN ELECTRICAL AND ELECTRONICS ENGINEERING (ICADEE), 2020, : 106 - 111
  • [33] Improved estimation of the correlation matrix using reinforcement learning and text-based networks
    Lu, Cheng
    Ndiaye, Papa Momar
    Simaan, Majeed
    INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS, 2024, 96
  • [34] Exploring fonts as retrieval cues in text-based learning
    Krieglstein, Felix
    Jansen, Sebastian
    Meusel, Felicia
    Scheller, Nadine
    Schmitz, Manuel
    Wesenberg, Lukas
    Rey, Guenter Daniel
    ACTA PSYCHOLOGICA, 2024, 251
  • [35] ONLINE PROCESSING OF PREEXISTING KNOWLEDGE MISCONCEPTIONS AND TEXT-BASED INCONSISTENCIES
    PRINZO, OV
    DANKS, JH
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1987, 25 (05) : 350 - 350
  • [36] Validation: Knowledge- and Text-Based Monitoring During Reading
    van Moort, Marianne L.
    Koornneef, Arnout
    van den Broek, Paul W.
    DISCOURSE PROCESSES, 2018, 55 (5-6) : 480 - 496
  • [37] Personal Knowledge Base Construction from Text-based Lifelogs
    Yen, An-Zi
    Huang, Hen-Hsen
    Chen, Hsin-Hsi
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 185 - 194
  • [38] INTERACTIVE EFFECTS OF TEXT-BASED AND TASK-BASED IMPORTANCE ON LEARNING FROM TEXT
    SCHRAW, G
    WADE, SE
    KARDASH, CAM
    JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1993, 85 (04) : 652 - 661
  • [39] Comparison of Image-Based and Text-Based Source Code Classification Using Deep Learning
    Kiyak E.O.
    Cengiz A.B.
    Birant K.U.
    Birant D.
    SN Computer Science, 2020, 1 (5)
  • [40] KID Model Realization Using Memory Networks for Text-based Q/A Analyses and Learning
    Li, Jiandong
    Huang, Runhe
    Wang, Kevin I-Kai
    Cao, Jiannong
    IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 101 - 108