BadRock at SemEval-2024 Task 8: DistilBERT to Detect Multigenerator, Multidomain and Multilingual Black-Box Machine-Generated Text

被引:0
|
作者
Siino, Marco [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of Large Language Models (LLMs) has brought about a notable shift, rendering them increasingly ubiquitous and readily accessible. Across diverse platforms such as social media platforms, news outlets, educational platforms, question-answering forums, and even academic domains, there has been a notable surge in machine-generated content. Recent iterations of LLMs, exemplified by models like ChatGPT and GPT-4, exhibit a remarkable ability to produce coherent and contextually relevant responses across a broad spectrum of user inquiries. The fluidity and sophistication of these generated texts position LLMs as compelling candidates for substituting human labour in numerous applications. Nevertheless, this proliferation of machine-generated content has raised apprehensions regarding potential misuse, including the dissemination of misinformation and disruption of educational ecosystems. Given that humans marginally outperform random chance in discerning between machine-generated and human-authored text, there arises a pressing imperative to develop automated systems capable of accurately distinguishing machine-generated text. This pursuit is driven by the overarching objective of curbing the potential misuse of machine-generated content. Our manuscript delineates the approach we adopted for participation in this competition. Specifically, we detail the fine-tuning and the use of a DistilBERT model for classifying each sample in the test set provided. Our submission is able to reach an accuracy equal to 0.754 in place of the worst result obtained at the competition that is equal to 0.231.
引用
收藏
页码:239 / 245
页数:7
相关论文
共 48 条
  • [31] TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques
    Urlana, Ashok
    Saibewar, Aditya
    Garlapati, Bala Mallikarjunarao
    Kumar, Charaka Vinayak
    Singh, Ajeet Kumar
    Chalamala, Srinivasa Rao
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 927 - 934
  • [32] PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?
    Petukhova, Kseniia
    Kazakov, Roman
    Kochmar, Ekaterina
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1140 - 1147
  • [33] Mast Kalandar at SemEval-2024 Task 8: On the Trail of Textual Origins: RoBERTa-BiLSTM Approach to Detect AI-Generated Text
    Bafna, Jainit Sushil
    Mittal, Hardik
    Sethia, Suyash
    Shrivastava, Manish
    Mamidi, Radhika
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1627 - 1633
  • [34] Team MLab at SemEval-2024 Task 8: Analyzing Encoder Embeddings for Detecting LLM-generated Text
    Li, Kevin
    Hasanaliyev, Kenan
    Zhu, Sally
    Altshuler, George
    Eberts, Alden
    Chen, Eric
    Wang, Kate
    Xia, Emily
    Browne, Eli
    Chen, Ian
    Eren, Umut
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1463 - 1467
  • [35] Mast Kalandar at SemEval-2024 Task 8: On the Trail of Textual Origins: RoBERTa-BiLSTM Approach to Detect AI-Generated Text
    Bafna, Jainit Sushil
    Mittal, Hardik
    Sethia, Suyash
    Shrivastava, Manish
    Mamidi, Radhika
    arXiv,
  • [36] M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
    Wang, Yuxia
    Mansurov, Jonibek
    Ivanov, Petar
    Su, Jinyan
    Shelmanov, Artem
    Tsvigun, Akim
    Afzal, Osama Mohammed
    Mahmoud, Tarek
    Puccetti, Giovanni
    Arnold, Thomas
    Aji, Alham Fikri
    Habash, Nizar
    Gurevych, Iryna
    Nakov, Preslav
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3964 - 3992
  • [37] iimasNLP at SemEval-2024 Task 8: Unveiling structure-aware language models for automatic generated text identification
    Valdez, Andric
    Gomez-Adorno, Helena
    Marquez, Fernando
    Pantaleon, Jorge
    Bel-Enguix, Gemma
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1110 - 1114
  • [38] SINAI at SemEval-2024 Task 8: Fine-tuning onWords and Perplexity as Features for Detecting Machine Written Text
    Gutierrez-Megias, Alberto J.
    Urena-Lopez, L. Alfonso
    Martinez-Camara, Eugenio
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1505 - 1510
  • [39] Mashee at SemEval-2024 Task 8: The Impact of Samples Quality on the Performance of In-Context Learning for Machine Text Classification
    Rasheed, Areeg Fahad
    Zarkoosh, M.
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 60 - 63
  • [40] YNU-HPCC at SemEval-2024 Task 1: Self-Instruction Learning with Black-box Optimization for Semantic Textual Relatedness
    Li, Weijie
    Wang, Jin
    Zhang, Xuejie
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 792 - 799