BadRock at SemEval-2024 Task 8: DistilBERT to Detect Multigenerator, Multidomain and Multilingual Black-Box Machine-Generated Text

被引:0
|
作者
Siino, Marco [1 ]
机构
[1] Univ Catania, Dept Elect Elect & Comp Engn, Catania, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rise of Large Language Models (LLMs) has brought about a notable shift, rendering them increasingly ubiquitous and readily accessible. Across diverse platforms such as social media platforms, news outlets, educational platforms, question-answering forums, and even academic domains, there has been a notable surge in machine-generated content. Recent iterations of LLMs, exemplified by models like ChatGPT and GPT-4, exhibit a remarkable ability to produce coherent and contextually relevant responses across a broad spectrum of user inquiries. The fluidity and sophistication of these generated texts position LLMs as compelling candidates for substituting human labour in numerous applications. Nevertheless, this proliferation of machine-generated content has raised apprehensions regarding potential misuse, including the dissemination of misinformation and disruption of educational ecosystems. Given that humans marginally outperform random chance in discerning between machine-generated and human-authored text, there arises a pressing imperative to develop automated systems capable of accurately distinguishing machine-generated text. This pursuit is driven by the overarching objective of curbing the potential misuse of machine-generated content. Our manuscript delineates the approach we adopted for participation in this competition. Specifically, we detail the fine-tuning and the use of a DistilBERT model for classifying each sample in the test set provided. Our submission is able to reach an accuracy equal to 0.754 in place of the worst result obtained at the competition that is equal to 0.231.
引用
收藏
页码:239 / 245
页数:7
相关论文
共 48 条
  • [41] NootNoot at SemEval-2024 Task 8: Fine-tuning Language Models for AI vs Human Generated Text detection
    Bahad, Sankalp
    Bhaskar, Yash
    Krishnamurthy, Parameswari
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 918 - 921
  • [42] Team jelarson at SemEval 2024 Task 8: Predicting Boundary Line Between Human and Machine Generated Text
    Larson, Joseph
    Tyers, Francis
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 477 - 484
  • [43] Groningen Group E at SemEval-2024 Task 8: Detecting machine-generated texts through pre-trained language models augmented with explicit linguistic-stylistic features
    Darwinkel, Patrick
    van Vaals, Sijbren
    van der Holt, Marieke
    van Houten, Jarno
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 1006 - 1014
  • [44] Werkzeug at SemEval-2024 Task 8: LLM-Generated Text Detection via Gated Mixture-of-Experts Fine-Tuning
    Wu, Youlin
    Wang, Kaichun
    Ma, Kai
    Yang, Liang
    Lin, Hongfei
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 547 - 552
  • [45] I2C-Huelva at SemEval-2024 Task 8: Boosting AI-Generated Text Detection with Multimodal Models and Optimized Ensembles
    Pena, Alberto Rodero
    Vazquez, Jacinto Mata
    Alvarez, Victoria Pachon
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 845 - 852
  • [46] M4: Multi-Generator, Multi-Domain, and Multi-Lingual Black-Box Machine-Generated Text Detection
    Wang, Yuxia
    Mansurov, Jonibek
    Ivanov, Petar
    Su, Jinyan
    Shelmanov, Artem
    Tsvigun, Akim
    Whitehouse, Chenxi
    Afzal, Osama Mohammed
    Mahmoud, Tarek
    Sasaki, Toru
    Arnold, Thomas
    Aji, Alham Fikri
    Habash, Nizar
    Gurevych, Iryna
    Nakov, Preslav
    EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, 2024, 1 : 1369 - 1407
  • [47] TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text
    Qu, Xiaoyan
    Meng, Xiangfeng
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 710 - 715
  • [48] Fralak at SemEval-2024 Task 4: combining RNN-generated hierarchy paths with simple neural nets for hierarchical multilabel text classification in a multilingual zero-shot setting
    Laken, Katarina
    PROCEEDINGS OF THE 18TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2024, 2024, : 596 - 601