Authorship Attribution for Social Media Forensics

被引:105
|
作者
Rocha, Anderson [1 ]
Scheirer, Walter J. [2 ]
Forstall, Christopher W. [2 ]
Cavalcante, Thiago [1 ]
Theophilo, Antonio [3 ,4 ]
Shen, Bingyu [2 ]
Carvalho, Ariadne R. B. [1 ]
Stamatatos, Efstathios [5 ]
机构
[1] Univ Estadual Campinas, Inst Comp, BR-13083852 Campinas, SP, Brazil
[2] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
[3] Univ Estadual Campinas, Inst Comp, BR-13083852 Campinas, SP, Brazil
[4] Ctr Informat Technol Renato Archer, BR-13069901 Campinas, SP, Brazil
[5] Univ Aegean, Dept Informat & Commun Syst Engn, Karlovassi 83200, Greece
基金
巴西圣保罗研究基金会;
关键词
Authorship attribution; forensics; social media; machine learning; computational linguistics; stylometry; IDENTIFICATION; FEATURES; CLASSIFIERS; MODELS;
D O I
10.1109/TIFS.2016.2603960
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The veil of anonymity provided by smartphones with pre-paid SIM cards, public Wi-Fi hotspots, and distributed networks like Tor has drastically complicated the task of identifying users of social media during forensic investigations. In some cases, the text of a single posted message will be the only clue to an author's identity. How can we accurately predict who that author might be when the message may never exceed 140 characters on a service like Twitter? For the past 50 years, linguists, computer scientists, and scholars of the humanities have been jointly developing automated methods to identify authors based on the style of their writing. All authors possess peculiarities of habit that influence the form and content of their written works. These characteristics can often be quantified and measured using machine learning algorithms. In this paper, we provide a comprehensive review of the methods of authorship attribution that can be applied to the problem of social media forensics. Furthermore, we examine emerging supervised learning-based methods that are effective for small sample sizes, and provide step-by-step explanations for several scalable approaches as instructional case studies for newcomers to the field. We argue that there is a significant need in forensics for new authorship attribution algorithms that can exploit context, can process multi-modal data, and are tolerant to incomplete knowledge of the space of all possible authors at training time.
引用
收藏
页码:5 / 33
页数:29
相关论文
共 50 条
  • [1] Authorship Attribution of Social Media Messages
    Theophilo, Antonio
    Giot, Romain
    Rocha, Anderson
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (01) : 10 - 23
  • [2] EXPLAINABLE ARTIFICIAL INTELLIGENCE FOR AUTHORSHIP ATTRIBUTION ON SOCIAL MEDIA
    Theophilo, Antonio
    Padilha, Rafael
    Andalo, Fernanda A.
    Rocha, Anderson
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2909 - 2913
  • [3] Semantic Clustering and Transfer Learning in Social Media Texts Authorship Attribution
    Fedotova, Anastasia
    Kurtukova, Anna
    Romanov, Aleksandr
    Shelupanov, Alexander
    IEEE ACCESS, 2024, 12 : 39783 - 39803
  • [4] Astroturfing detection in social media: Using binary n-gram analysis for authorship attribution
    Peng, Jian
    Choo, Kim-Kwang Raymond
    Ashman, Helen
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 121 - 128
  • [5] Data Forensics On Social Media
    Doultani, Mannat Amit
    Vijayalakshmi, M.
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [6] Media forensics on social media platforms: a survey
    Pasquini, Cecilia
    Amerini, Irene
    Boato, Giulia
    EURASIP JOURNAL ON INFORMATION SECURITY, 2021, 2021 (01)
  • [7] Media forensics on social media platforms: a survey
    Cecilia Pasquini
    Irene Amerini
    Giulia Boato
    EURASIP Journal on Information Security, 2021
  • [8] AUTHORSHIP ATTRIBUTION
    HOLMES, DI
    COMPUTERS AND THE HUMANITIES, 1994, 28 (02): : 87 - 106
  • [9] Authorship Attribution for textual data on Online Social Networks
    Banga, Ritu
    Mehndiratta, Pulkit
    2017 TENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2017, : 155 - 161
  • [10] A novel approach of mining write-prints for authorship attribution in e-mail forensics
    Iqbal, Farkhund
    Hadjidj, Rachid
    Fung, Benjamin C. M.
    Debbabi, Mourad
    DIGITAL INVESTIGATION, 2008, 5 (S42-S51) : S42 - S51