Classification of Human and Machine-Generated Texts Using Lexical Features and Supervised/Unsupervised Machine Learning Algorithms

被引:0
|
作者
Rojas-Simon, Jonathan [1 ]
Ledeneva, Yulia [1 ]
Arnulfo Garcia-Hernandez, Rene [1 ]
机构
[1] Autonomous Univ State Mexico, Inst Literario 100, Toluca 50000, State Of Mexico, Mexico
来源
关键词
Large-Language Models (LLMs); AuTexTification; Lexical Features; Supervised/Unsupervised Learning Algorithms; Text representation models;
D O I
10.1007/978-3-031-62836-8_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In today's digital information era, distinguishing between human- and machine-generated texts has become a focus of study in academia and industry. This is because Large-Language Models (LLMs) can produce high-quality texts, posing a challenge to the legitimacy and authenticity of texts. In this regard, it is essential to create methods and models that can differentiate whether a human or an LLM wrote a text. Therefore, this paper explores the effectiveness of supervised and unsupervised machine learning algorithms using lexical features. Mainly, we focused on traditional algorithms, such as Multilayer Perceptron (MLP), Naive Bayes (NB), Logistic Regression (LR), Agglomerative Hierarchical Clustering (AHC), and K-means Clustering (KC). Obtained results have been compared to state-of-the-art approaches presented in the Automated Text Identification (AuTexTification) shared task, serving as reference methods. Moreover, we have found that both NB and KC may achieve competitive results in the before-mentioned task.
引用
收藏
页码:331 / 341
页数:11
相关论文
共 50 条
  • [21] A Deep Fusion Model for Human vs. Machine-Generated Essay Classification
    Corizzo, Roberto
    Leal-Arenas, Sebastian
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Automated Well-Log Processing and Lithology Classification by Identifying Optimal Features Through Unsupervised and Supervised Machine-Learning Algorithms
    Singh, Harpreet
    Seol, Yongkoo
    Myshakin, Evgeniy M.
    SPE JOURNAL, 2020, 25 (05): : 2778 - 2800
  • [23] Performance Analysis of Supervised Machine Learning Algorithms for Text Classification
    Mishu, Sadia Zaman
    Rafiuddin, S. M.
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 409 - 413
  • [24] Evaluation of Supervised Machine Learning Classification Algorithms for Fingerprint Recognition
    Rojas, Andres
    Dolecek, Gordana Jovanovic
    PROCEEDINGS OF 2021 GLOBAL CONGRESS ON ELECTRICAL ENGINEERING (GC-ELECENG 2021), 2021, : 1 - 4
  • [25] Evaluation of Classification for Project Features with Machine Learning Algorithms
    Fan, Ching-Lung
    SYMMETRY-BASEL, 2022, 14 (02):
  • [26] Detecting insurance fraud using supervised and unsupervised machine learning
    Debener, Joern
    Heinke, Volker
    Kriebel, Johannes
    JOURNAL OF RISK AND INSURANCE, 2023, 90 (03) : 743 - 768
  • [27] Supervised Rainfall Learning Model Using Machine Learning Algorithms
    Sharma, Amit Kumar
    Chaurasia, Sandeep
    Srivastava, Devesh Kumar
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 275 - 283
  • [28] Supervised and unsupervised machine learning approaches for tree classification using multiwavelength airborne polarimetric LiDAR
    Hu, Zhong
    Tan, Songxin
    SMART AGRICULTURAL TECHNOLOGY, 2025, 11
  • [29] Investigation of Epileptic Seizure Signatures Classification in EEG using Supervised Machine Learning Algorithms
    Al-jumaili, Saif
    Duru, Adil Deniz
    Ibrahim, Abdullahi Abdu
    Ucan, Osman Nuri
    TRAITEMENT DU SIGNAL, 2023, 40 (01) : 43 - 54
  • [30] Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms
    de Toledo, Thais
    Torrisi, Nunzio
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2019, 1 (01): : 384 - 399