Using Content-Based Features for Author Profiling of Vietnamese Forum Posts

被引:3
|
作者
Duc Tran Duong [1 ]
Son Bao Pham [2 ]
Hanh Tan [1 ]
机构
[1] Posts & Telecommun Inst Technol, Hanoi, Vietnam
[2] Vietnam Natl Univ, Univ Engn & Technol, Fac Informat Technol, Hanoi, Vietnam
关键词
IDENTIFICATION;
D O I
10.1007/978-3-319-31277-4_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper reports the results of author profiling task for Vietnamese forum posts to identify the personal traits, such as gender, age, occupation, and location of the author using content-based features. Experiments were conducted on the different types of features, including stylometric features (such as lexical, syntactic, structural features) as well as content-based features (the most important words) to compare the performance and on the data sets we collected from the various forums in Vietnamese. Three learning methods, consisting of Decision Tree, Bayes Network, Support Vector Machine (SVM), were tested and the SVM achieved the best results. The results show that these kinds of features work well on such a kind of short and free style messages as forum posts, in which, content-based features yielded much better results than stylometric features.
引用
收藏
页码:287 / 296
页数:10
相关论文
共 50 条
  • [1] Author Profiles Prediction Using Syntactic and Content-Based Features
    Reddy, T. Raghunadha
    Srilatha, M.
    Sreenivas, M.
    Rajasekhar, N.
    DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 265 - 273
  • [2] Content-based image retrieval using composite features
    Kauniskangas, H
    Sauvola, J
    Pietikainen, M
    Doermann, D
    SCIA '97 - PROCEEDINGS OF THE 10TH SCANDINAVIAN CONFERENCE ON IMAGE ANALYSIS, VOLS 1 AND 2, 1997, : 35 - 42
  • [3] Figure Plagiarism Detection Using Content-Based Features
    Eisa, Taiseer
    Salim, Naomie
    Alzahrani, Salha
    RECENT DEVELOPMENTS IN INTELLIGENT COMPUTING, COMMUNICATION AND DEVICES, ICCD 2016, 2017, 555 : 17 - 20
  • [4] Content-based image retrieval using multiple features
    Zhang, Chi
    Huang, Lei
    Journal of Computing and Information Technology, 2014, 22 (SpecialIssue) : 1 - 10
  • [5] Content-based image retrieval using texture features
    Honda, MO
    Azevedo-Marques, PM
    Rodrigues, JAH
    CARS 2002: COMPUTER ASSISTED RADIOLOGY AND SURGERY, PROCEEDINGS, 2002, : 1036 - 1036
  • [6] Content-based Approach for Vietnamese Spam SMS Filtering
    Pham, Thai-Hoang
    Le-Hong, Phuong
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 41 - 44
  • [7] Code Authorship Attribution using content-based and non-content-based features
    Bayrami, Parinaz
    Rice, Jacqueline E.
    2021 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2021,
  • [9] Content-based image retrieval using colour and shape features
    Park, YoungJae
    Park, KeeHong
    Kim, GyeYoung
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2013, 48 (02) : 155 - 161
  • [10] The Content-based Image Retrieval Method Using Multiple Features
    Ha, Jeong-Yo
    Kim, Gye-Young
    Choi, Hyung-Il
    NCM 2008 : 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 652 - 657