Genetic Optimization of Keyword Subsets in the Classification Analysis of Authorship of Texts

被引:0
|
作者
Pavlyshenko, Bohdan [1 ]
机构
[1] Ivan Franko Lviv Natl Univ, UA-79005 Lvov, Ukraine
关键词
D O I
10.1080/09296174.2014.944329
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The genetic selection of keyword sets, the text frequencies of which are considered as attributes in text classification analysis, has been analysed. The genetic optimization was performed on a set of words, which is the fraction of the frequency dictionary with given frequency limits. The frequency dictionary was formed on the basis of an analysed text array of texts of English fiction. As the fitness function which is minimized by the genetic algorithm, the error of the nearest k neighbours classifier was used. The results obtained show high precision and recall of text classification by authorship categories on the basis of attributes of the keyword sets which were selected by the genetic algorithm from the frequency dictionary.
引用
收藏
页码:341 / 349
页数:9
相关论文
共 50 条
  • [1] Classification analysis of authorship fiction texts in the space of semantic fields
    Pavlyshenko, Bohdan
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2013, 20 (03) : 218 - 226
  • [2] Keyword Analysis Visualization for Chinese Historical Texts
    Zeng, Jihui
    Zhan, Beibei
    Zhang, Shao
    Bie, Jiajun
    Xiao, Sheng
    PROCEEDINGS OF THE 12TH INTERNATIONAL SYMPOSIUM ON VISUAL INFORMATION COMMUNICATION AND INTERACTION, VINCI 2019, 2019,
  • [3] THE CLASSIFICATION AND ANALYSIS OF TEXTS
    FRANKE, W
    ZEITSCHRIFT FUR GERMANISTISCHE LINGUISTIK, 1987, 15 (03): : 263 - 281
  • [4] Information systems frontiers: Keyword analysis and classification
    Bang, Chulhwan Chris
    INFORMATION SYSTEMS FRONTIERS, 2015, 17 (01) : 217 - 237
  • [5] Information systems frontiers: Keyword analysis and classification
    Chulhwan Chris Bang
    Information Systems Frontiers, 2015, 17 : 217 - 237
  • [6] Analysis of keyword extraction methods for legal document classification
    Marinato, Matheus S.
    Santana, Ewaldo E. C.
    Jacob Jr, Antonio F. L.
    REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2024, 16 (02): : 88 - 96
  • [7] THE COMPARATIVE ANALYSIS OF EFFICIENCY OF ALGORITHMS OF TEXTS AUTHORSHIP RECOGNITION ON TRANSITIONS FREQUENCIES
    Poddubny, V. V.
    Shevelyov, O. G.
    Fatyhov, A. A.
    TOMSK STATE UNIVERSITY JOURNAL, 2006, (290): : 232 - +
  • [8] Whose American Government? A Quantitative Analysis of Gender and Authorship in American Politics Texts
    Cassese, Erin C.
    Bos, Angela L.
    Schneider, Monica C.
    JOURNAL OF POLITICAL SCIENCE EDUCATION, 2014, 10 (03) : 253 - 272
  • [9] MEDLATINEPI and MEDLATINLIT: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts
    Corbara, Silvia
    Moreo, Alejandro
    Sebastiani, Fabrizio
    Tavoni, Mirko
    ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, 2022, 15 (03):
  • [10] Design and analysis of genetic algorithm based Chinese keyword extracting
    Gao, Kai
    Zhang, Hua-Ping
    Xu, Yun-Feng
    Gao, Guo-Jiang
    Li, Yang-Jie
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2013, 48 (01) : 27 - 35