Detecting authorship deception: a supervised machine learning approach using author writeprints

被引:31
|
作者
Pearl, Lisa [1 ]
Steyvers, Mark [1 ]
机构
[1] Univ Calif Irvine, Dept Cognit Sci, Irvine, CA 92697 USA
来源
LITERARY AND LINGUISTIC COMPUTING | 2012年 / 27卷 / 02期
关键词
ATTRIBUTION;
D O I
10.1093/llc/fqs003
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We describe a new supervised machine learning approach for detecting authorship deception, a specific type of authorship attribution task particularly relevant for cybercrime forensic investigations, and demonstrate its validity on two case studies drawn from realistic online data sets. The core of our approach involves identifying uncharacteristic behavior for an author, based on a writeprint extracted from unstructured text samples of the author's writing. The writeprints used here involve stylometric features and content features derived from topic models, an unsupervised approach for identifying relevant keywords that relate to the content areas of a document. One innovation of our approach is to transform the writeprint feature values into a representation that individually balances characteristic and uncharacteristic traits of an author, and we subsequently apply a Sparse Multinomial Logistic Regression classifier to this novel representation. Our method yields high accuracy for authorship deception detection on the two case studies, confirming its utility.
引用
收藏
页码:183 / 196
页数:14
相关论文
共 50 条
  • [1] A supervised machine learning approach to author disambiguation in the Web of Science
    Rehs, Andreas
    [J]. JOURNAL OF INFORMETRICS, 2021, 15 (03)
  • [2] Detecting deception using machine learning with facial expressions and pulse rate
    Kento Tsuchiya
    Ryo Hatano
    Hiroyuki Nishiyama
    [J]. Artificial Life and Robotics, 2023, 28 : 509 - 519
  • [3] Detecting deception using machine learning with facial expressions and pulse rate
    Tsuchiya, Kento
    Hatano, Ryo
    Nishiyama, Hiroyuki
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2023, 28 (03) : 509 - 519
  • [4] Detecting Mislabeled Data Using Supervised Machine Learning Techniques
    Poel, Mannes
    [J]. AUGMENTED COGNITION: NEUROCOGNITION AND MACHINE LEARNING, AC 2017, PT I, 2017, 10284 : 571 - 581
  • [5] Detecting insurance fraud using supervised and unsupervised machine learning
    Debener, Joern
    Heinke, Volker
    Kriebel, Johannes
    [J]. JOURNAL OF RISK AND INSURANCE, 2023, 90 (03) : 743 - 768
  • [6] Detecting Cyberbullying in Social Commentary Using Supervised Machine Learning
    Raza, Muhammad Owais
    Memon, Mohsin
    Bhatti, Sania
    Bux, Rahim
    [J]. ADVANCES IN INFORMATION AND COMMUNICATION, VOL 2, 2020, 1130 : 621 - 630
  • [7] Detecting Significant Events in Lecture Video using Supervised Machine Learning
    Brooks, Christopher
    Amundson, Kristofor
    Greer, Jim
    [J]. ARTIFICIAL INTELLIGENCE IN EDUCATION: BUILDING LEARNING SYSTEMS THAT CARE: FROM KNOWLEDGE REPRESENTATION TO AFFECTIVE MODELLING, 2009, 200 : 483 - +
  • [8] Detecting Under-Resolved Flow Physics Using Supervised Machine Learning
    Hedayat, Amirpasha
    Ollivier-Gooch, Carl
    [J]. AIAA JOURNAL, 2023, 61 (09) : 3958 - 3975
  • [9] In the Zone: Towards Detecting Student Zoning Out Using Supervised Machine Learning
    Drummond, Joanna
    Litman, Diane
    [J]. INTELLIGENT TUTORING SYSTEMS, PART II, 2010, 6095 : 306 - 308
  • [10] DETECTING MALICIOUS PDF DOCUMENTS USING SEMI-SUPERVISED MACHINE LEARNING
    Jiang, Jianguo
    Song, Nan
    Yu, Min
    Chow, Kam-Pui
    Li, Gang
    Liu, Chao
    Huang, Weiqing
    [J]. ADVANCES IN DIGITAL FORENSICS XVII, 2021, 612 : 135 - 155