Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique

被引:2
|
作者
Chandrika, C. P. [1 ]
Kallimani, Jagadish S. [1 ,2 ]
机构
[1] M S Ramaiah Inst Technol, Bangalore 560054, India
[2] Visvesvaraya Technol Univ, Belagavi, Karnataka, India
关键词
Authorship attribution; Decision tree; Instance based approach; Machine learning algorithms; Naive bayes; Profile based approach; Random forest and support vector machine;
D O I
10.1007/978-981-19-2281-7_51
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Authorship Attribution is the task of identifying a true author of a given text from a set of suspected authors stylometry features play a vital role in recognizing the right author, it includes lexical and syntactic features. N-gram is one of the popular techniques used to extract syntactic features from the text. The main objective of this work is to use both lexical and syntactic features on a Kannada text and compare the performance of both approaches using different machine learning algorithms. The Kannada language is spoken by the Indian southern state Karnataka. Even though we can see major works in text processing, Authorship Attribution is in a tender state. Researches have been carried out on handwritten Kannada documents but not on digital text. Char n-gram, word n-gram and the combination of these two known as Amalgamation technique are used as syntactic features to extract the writing style of an author. The results show that Support Vector Machine algorithm outperform with 94% and 60% accuracy using N-grams and lexical features respectively.
引用
收藏
页码:547 / 557
页数:11
相关论文
共 50 条
  • [1] Authorship Attribution in Portuguese Using Character N-grams
    Markov, Ilia
    Baptista, Jorge
    Pichardo-Lagunas, Obdulia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
  • [2] An improved N-grams based Model for Authorship Attribution
    Boughaci, Dalila
    Benmesbah, Mounir
    Zebiri, Aniss
    [J]. 2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 70 - 75
  • [3] Authorship Attribution of Ancient Texts Written by Ten Arabic Travelers Using Character N-Grams
    Ouamour, Siham
    Sayoud, Halim
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2013,
  • [4] Authorship attribution of Spanish poems using n-grams and the Web as Corpus
    Guzman-Cabrera, Rafael
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2391 - 2396
  • [5] Automatic word spacing using probabilistic models based on character n-grams
    Lee, Do-Gil
    Rim, Hae-Chang
    Yook, Dongsuk
    [J]. IEEE INTELLIGENT SYSTEMS, 2007, 22 (01) : 28 - 35
  • [6] Impact of Character n-grams Attention Scores for English and Russian News Articles Authorship Attribution
    Makhmutova, Liliya
    Ross, Robert
    Salton, Giancarlo
    [J]. 38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 939 - 941
  • [7] Complete Syntactic N-grams as Style Markers for Authorship Attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    [J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 9 - 17
  • [8] Using Word N-Grams as Features in Arabic Text Classification
    Al-Thubaity, Abdulmohsen
    Alhoshan, Muneera
    Hazzaa, Itisam
    [J]. SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2015, 569 : 35 - 43
  • [9] Authorship Attribution on Kannada Text using Bi-Directional LSTM Technique
    Chandrika, C. P.
    Kallimani, Jagadish S.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 963 - 971
  • [10] Authorship Attribution on Kannada Text using Bi-Directional LSTM Technique
    Chandrika, C.P.
    Kallimani, Jagadish S
    [J]. International Journal of Advanced Computer Science and Applications, 2022, 13 (09): : 963 - 971