Personality Profiling from Text: Introducing Part-of-Speech N-Grams

被引:0
|
作者
Wright, William R. [1 ]
Chin, David N. [1 ]
机构
[1] Univ Hawaii Manoa, Dept Informat & Comp Sci, Honolulu, HI 96822 USA
关键词
personality; classifier; part-of-speech n-grams; information gain; support vector machine;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A support vector machine is trained to classify the Five Factor personality of writers of free text. Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.
引用
收藏
页码:243 / 253
页数:11
相关论文
共 50 条
  • [1] Automatic Genre Classification via N-grams of Part-of-Speech Tags
    Tang, Xiaoyan
    Cao, Jing
    [J]. CURRENT WORK IN CORPUS LINGUISTICS: WORKING WITH TRADITIONALLY- CONCEIVED CORPORA AND BEYOND (CILC2015), 2015, 198 : 474 - 478
  • [2] Part of speech n-grams and Information Retrieval
    Lioma, Christina
    van Rijsbergen, C. J. Keith
    [J]. REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2008, 13 (01): : 9 - 22
  • [3] An external plagiarism detection system based on part-of-speech (POS) tag n-grams and word embedding
    Yalcin, Kadir
    Cicekli, Ilyas
    Ercan, Gonenc
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 197
  • [4] Enhancement of Automatic Oral Presentation Assessment System using Latent N-Grams Word Representation and Part-of-Speech Information
    Huang, Wen-Yu
    Hsiao, Shan-Wen
    Sun, Hung-Ching
    Hsieh, Ming-Chuan
    Tsai, Ming-Hsueh
    Lee, Chi-Chun
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1432 - 1436
  • [5] Better text compression from fewer lexical n-grams
    Smith, TC
    Lorenz, M
    [J]. DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, : 516 - 516
  • [6] CONTINUOUS MODELS OF AFFECT FROM TEXT USING N-GRAMS
    Malandrakis, Nikolaos
    Potamianos, Alexandros
    Narayanan, Shrikanth
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8500 - 8504
  • [7] Using N-grams for arabic text searching
    Mustafa, SH
    Al-Radaideh, QA
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2004, 55 (11): : 1002 - 1007
  • [8] SPEECH RECOGNITION USING FUNCTION-WORD N-GRAMS AND CONTENT-WORD N-GRAMS
    ISOTANI, R
    MATSUNAGA, S
    SAGAYAMA, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1995, E78D (06) : 692 - 697
  • [9] Part-of-Speech Sequences in Literary Text: Evidence From Ukrainian
    Rovenchak, Andrij
    Buk, Solomija
    [J]. JOURNAL OF QUANTITATIVE LINGUISTICS, 2018, 25 (01) : 1 - 21
  • [10] Sentence Classification Using N-Grams in Urdu Language Text
    Awan, Malik Daler Ali
    Ali, Sikandar
    Samad, Ali
    Iqbal, Nadeem
    Missen, Malik Muhammad Saad
    Ullah, Niamat
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021