Stephen Colbert at SemEval-2023 Task 5: Using Markup for Classifying Clickbait

被引:0
|
作者
Spreitzer, Sabrina [1 ]
Hoai Nam Tran [1 ]
机构
[1] Univ Regensburg, Regensburg, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For SemEval-2023 Task 5, we have submitted three DeBERTaV3(LARGE) models to tackle the first subtask, classifying spoiler types (passage, phrase, multi) of clickbait web articles. The choice of basic parameters like sequence length with BERTBASE uncased and further approaches were then tested with DeBERTaV3(BASE) only moving the most promising ones to DeBERTaV3(LARGE). Our research showed that information-placement on webpages is often optimized regarding e.g. adplacement. Those informations are usually described within the webpages markup which is why we conducted an approach that takes this into account. Overall we could not manage to beat the baseline, which we lead down to three reasons: First we only crawled markup for Huffington Post articles, extracting only - and <a>-tags which will not cover enough aspects of a webpages design. Second Huffington Post articles are overrepresented in the given dataset, which, third, shows an imbalance towards the spoiler tags. We highly suggest re-annotating the given dataset to use markup-optimized models like MarkupLM or TIE and to clear it from embedded articles like "Yahoo" or archives like "archive.is" or "web.archive" to avoid noise. Also, the imbalance should be tackled by adding articles from sources other than Huffington Post, considering that also multi-tagged entries should be balanced towards passage- and phrase-tagged ones.
引用
收藏
页码:1844 / 1848
页数:5
相关论文
共 50 条
  • [1] SemEval-2023 Task 5: Clickbait Spoiling
    Froebe, Maik
    Gollub, Tim
    Stein, Benno
    Hagen, Matthias
    Potthast, Martin
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 2275 - 2286
  • [2] Brooke-English at SemEval-2023 Task 5: Clickbait Spoiling
    Tang, Shirui
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 64 - 76
  • [3] nancy-hicks-gribble at SemEval-2023 Task 5: Classifying and generating clickbait spoilers with RoBERTa
    Keller, Jueri
    Rehbach, Nicolas
    Zafar, Ibrahim
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1712 - 1717
  • [4] Matt Bai at SemEval-2023 Task 5: Clickbait spoiler classification via BERT
    Tailor, Nukit
    Mamidi, Radhika
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1067 - 1068
  • [5] Francis Wilde at SemEval-2023 Task 5: Clickbait Spoiler Type Identification with Transformers
    Indurthi, Vijayasaradhi
    Varma, Vasudeva
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1890 - 1893
  • [6] Clark Kent at SemEval-2023 Task 5: SVMs, Transformers, and Pixels for Clickbait Spoiling
    Mihalcea, Dragos-Stefan
    Nisioi, Sergiu
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1204 - 1212
  • [7] Gallagher at SemEval-2023 Task 5: Tackling Clickbait with Seq2Seq Models
    Bilgis, Tugay
    Bozdag, Nimet Beyza
    Bethard, Steven
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1650 - 1655
  • [8] Billy-Batson at SemEval-2023 Task 5: An Information Condensation based System for Clickbait Spoiling
    Sharma, Anubhav
    Joshi, Sagar
    Abhishek, Tushar
    Mamidi, Radhika
    Varma, Vasudeva
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1878 - 1889
  • [9] Mr-wallace at SemEval-2023 Task 5: Novel Clickbait Spoiling Algorithm Using Natural Language Processing
    Saravanan, Vineet
    Wilson, Steven
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 1625 - 1629
  • [10] Sabrina Spellman at SemEval-2023 Task 5: Discover the Shocking Truth Behind this Composite Approach to Clickbait Spoiling!
    Birkenheuer, Simon
    Drechsel, Jonathan
    Justen, Paul
    Poehlmann, Jimmy
    Gonsior, Julius
    Reusch, Anja
    17TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2023, 2023, : 969 - 977