SummScreen: A Dataset for Abstractive Screenplay Summarization

被引:0
|
作者
Chen, Mingda [1 ]
Chu, Zewei [3 ]
Wiseman, Sam [1 ,2 ]
Gimpel, Kevin [1 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[2] Duke Univ, Durham, NC 27706 USA
[3] Univ Chicago, Chicago, IL 60637 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce SUMMSCREEN, a summarization dataset comprised of pairs of TV series transcripts and human written recaps. The dataset provides a challenging testbed for abstractive summarization for several reasons. Plot details are often expressed indirectly in character dialogues and may be scattered across the entirety of the transcript. These details must be found and integrated to form the succinct plot descriptions in the recaps. Also, TV scripts contain content that does not directly pertain to the central plot but rather serves to develop characters or provide comic relief. This information is rarely contained in recaps. Since characters are fundamental to TV series, we also propose two entity-centric evaluation metrics. Empirically, we characterize the dataset by evaluating several methods, including neural models and those based on nearest neighbors. An oracle extractive approach outperforms all benchmarked models according to automatic metrics, showing that the neural models are unable to fully exploit the input transcripts. Human evaluation and qualitative analysis reveal that our non-oracle models are competitive with their oracle counterparts in terms of generating faithful plot events and can benefit from better content selectors. Both oracle and non-oracle models generate unfaithful facts, suggesting future research directions.(1)
引用
收藏
页码:8602 / 8615
页数:14
相关论文
共 50 条
  • [1] CivilSum: A Dataset for Abstractive Summarization of Indian Court Decisions
    Malik, Manuj
    Zhao, Zheng
    Fonseca, Marcio
    Rao, Shrisha
    Cohen, Shay B.
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2241 - 2250
  • [2] BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
    Sharma, Eva
    Li, Chen
    Wang, Lu
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2204 - 2213
  • [3] TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records
    Gigant, Theo
    Dufaux, Frederic
    Guinaudeau, Camille
    Decombas, Marc
    [J]. 20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 61 - 70
  • [4] Abstractive text summarization using deep learning with a new Turkish summarization benchmark dataset
    Ertam, Fatih
    Aydin, Galip
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (09):
  • [5] Exploring Abstractive Text Summarization: Methods, Dataset, Evaluation, and Emerging Challenges
    Sunusi, Yusuf
    Omar, Nazlia
    Zakaria, Lailatul Qadri
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (07) : 1340 - 1357
  • [6] CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset
    Chen, Zheng
    Lin, Hongyu
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6932 - 6937
  • [7] WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
    Ladhak, Faisal
    Durmus, Esin
    Cardie, Claire
    McKeown, Kathleen
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4034 - 4048
  • [8] End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
    Raza, Hassan
    Shahzad, Waseem
    [J]. IEEE ACCESS, 2024, 12 : 40311 - 40324
  • [9] A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset
    Cinthia M. Souza
    Magali R. G. Meireles
    Paulo E. M. Almeida
    [J]. Scientometrics, 2021, 126 : 135 - 156
  • [10] A comparative study of abstractive and extractive summarization techniques to label subgroups on patent dataset
    Souza, Cinthia M.
    Meireles, Magali R. G.
    Almeida, Paulo E. M.
    [J]. SCIENTOMETRICS, 2021, 126 (01) : 135 - 156