Feature Selection for Enhanced Author Identification of Turkish Text

被引:3
|
作者
Bay, Yasemin [1 ]
Celebi, Erbug [2 ]
机构
[1] Cyprus Int Univ, Management Informat Syst, TR-10 Lefkosa, Mersin, Turkey
[2] Cyprus Int Univ, Dept Comp Engn, TR-10 Lefkosa, Mersin, Turkey
来源
关键词
Author identification; Text classification; Machine learning; Feature selection;
D O I
10.1007/978-3-319-22635-4_34
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The rapid growth of the Internet and the increasing availability of electronic documents poses some problems, such as identification of an anonymous text and plagiarism. This study aims to determine the author of a given document among the set of text documents whose author is known. Despite the excess number of researches conducted in English language for author identification in the last century, Turkish and other languages are gaining interest only in the last decade. Therefore, this study deals with the Author Identification problem using two different Turkish datasets, collected from two different Turkish newspapers. The datasets comprises 850 columns written by 17 columnists as a total, 50 columns from each columnist. 4 different Machine Learning algorithms (Naive Bayes, Support Vector Machine, the K-Nearest Neighbor and Decision Tree) have been employed and 99.7% accuracy is achieved with K-Nearest Neighbor algorithm. The classification fully recognized with Chi-square feature selection method by reducing the features from 20 to 17.
引用
收藏
页码:371 / 379
页数:9
相关论文
共 50 条
  • [1] An Enhanced Feature Selection for Text Documents
    Thatha, Venkata Nagaraju
    Babu, A. Sudhir
    Haritha, D.
    [J]. SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2, 2020, 160 : 21 - 29
  • [2] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [3] Genetic Heuristic Development: Feature Selection for Author Identification
    Adams, Joshua
    Williams, Henry
    Carter, Joi
    Dozier, Gerry
    [J]. PROCEEDINGS OF THE IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOMETRICS AND IDENTITY MANAGEMENT (CIBIM), 2013, : 36 - 41
  • [4] A Turkish Text Classification Based Feature Selection and Density Peaks Clustering
    Zorarpaci, Ezgi
    [J]. 2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [5] Enhanced Filter Feature Selection Methods for Arabic Text Categorization
    Ghareb, Abdullah Saeed
    Abu Bakara, Azuraliza
    Al-Radaideh, Qasem A.
    Hamdan, Abdul Razak
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (02) : 1 - 24
  • [6] Author attribution of Turkish texts by feature mining
    Tuerkoglu, Filiz
    Diri, Banu
    Amasyali, M. Fatih
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2007, 4681 : 1086 - +
  • [7] Hybrid feature selection based on enhanced genetic algorithm for text categorization
    Ghareb, Abdullah Saeed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 49 : 31 - 47
  • [8] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [9] Enhanced Binary Black Hole algorithm for text feature selection on resources classification
    Wu, Xian
    Fei, Minrui
    Wu, Dakui
    Zhou, Wenju
    Du, Songlin
    Fei, Zixiang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 274
  • [10] ON TEXT AUTHOR IDENTIFICATION IN NETWORK COMMUNICATION
    Romanova, V. Tatyana
    Khomenko, Anna Yu.
    [J]. VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2022, 21 (03): : 143 - 157