Social Network Multilingual Author Profiling using character and POS n-grams

被引:0
|
作者
Gonzalez-Gallardo, Carlos-Emiliano [1 ]
Torres-Moreno, Juan-Manuel [2 ]
Rendon, Azucena Montes [3 ]
Sierra, Gerardo [4 ]
机构
[1] LIA Univ Avignon, GIL Inst Ingn UNAM, Avignon, France
[2] LIA Univ Avignon, Ecole Polytech Montreal, Avignon, France
[3] CENIDET, Avignon, France
[4] GIL Inst Ingn UNAM, Mexico City, DF, Mexico
来源
LINGUAMATICA | 2016年 / 8卷 / 01期
关键词
Text Mining; Machine Learning; Text Classification; n-grams; Blogs; Tweets; Social Networks;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper we present an algorithm that combines the stylistic features represented by characters and POS n-grams to classify social network multilingual documents. In both n-gram groups a dynamic normalization by context was applied to extract all the possible stylistic information encoded in the documents (emoticons, character flooding, capital letters, references to other users, hyperlinks, hashtags, etc.). The algorithm was applied to two different corpus; Author Profiling of PAN-CLEF 2015 training tweets (Rangel et al., 2015) and the corpus of "Comments of Mexico City in time" (CCDMX). Results shows up to 90% of accuracy.
引用
下载
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [1] Efficient Social Network Multilingual Classification using Character, POS n-grams and Dynamic Normalization
    Gonzalez-Gallardo, Carlos-Emiliano
    Torres-Moreno, Juan-Manuel
    Rendon, Azucena Montes
    Sierra, Gerardo
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 307 - 314
  • [2] Which Granularity to Bootstrap a Multilingual Method of Document Alignment: Character N-grams or Word N-grams?
    Lecluze, Charlotte
    Rigouste, Lois
    Giguet, Emmanuel
    Lucas, Nadine
    CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 473 - 481
  • [3] Author Assertion of Furtive Write Print Using Character N-Grams
    Hassan, Feryal H.
    Chaurasia, Mousmi A.
    FUTURE INFORMATION TECHNOLOGY, 2011, 13 : 274 - 278
  • [4] Spam detection using character N-grams
    Kanaris, Ioannis
    Kanaris, Konstantinos
    Stamatatos, Efstathios
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 95 - 104
  • [5] Author verification using syntactic N-grams
    Center for Computing Research , Instituto Politécnico Nacional , Mexico City, Mexico
    CEUR Workshop Proc.,
  • [6] Authorship Attribution in Portuguese Using Character N-grams
    Markov, Ilia
    Baptista, Jorge
    Pichardo-Lagunas, Obdulia
    ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
  • [7] A first approach to CLIR using character n-grams alignment
    Vilares, Jesus
    Oakes, Michael P.
    Tait, John I.
    EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 111 - +
  • [8] Detection of Opinion Spam with Character n-grams
    Hernandez Fusilier, Donato
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Guzman Cabrera, Rafael
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 285 - 294
  • [9] Predicting Political Donations Using Twitter Hashtags and Character N-Grams
    Conrad, Colin
    Keselj, Vlado
    2016 IEEE 18TH CONFERENCE ON BUSINESS INFORMATICS (CBI), VOL. 2, 2016, : 1 - 7
  • [10] Language Identification in Multilingual, Short and Noisy Texts using Common N-Grams
    Kosmajac, Dijana
    Keselj, Vlado
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2752 - 2759