Query-based summarization of discussion threads

被引:6
|
作者
Verberne, Suzan [1 ]
Krahmer, Emiel [2 ]
Wubben, Sander [2 ]
van den Bosch, Antal [3 ,4 ]
机构
[1] Leiden Univ, Leiden Inst Adv Comp Sci, Leiden, Netherlands
[2] Tilburg Univ, Tilburg Sch Humanities, Tilburg, Netherlands
[3] Radboud Univ Nijmegen, Ctr Language Studies, Nijmegen, Netherlands
[4] Meertens Inst, Amsterdam, Netherlands
关键词
query-based summarization; discussion forums; reference summaries; word embeddings; evaluation; AGREEMENT; NETWORKS; DOCUMENT;
D O I
10.1017/S1351324919000123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum's search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread-query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.
引用
收藏
页码:3 / 29
页数:27
相关论文
共 50 条
  • [31] Query-based multi-documents summarization using linguistic knowledge and content word expansion
    Asad Abdi
    Norisma Idris
    Rasim M. Alguliyev
    Ramiz M. Aliguliyev
    Soft Computing, 2017, 21 : 1785 - 1801
  • [32] Query-Based Data Pricing
    Koutris, Paraschos
    Upadhyaya, Prasang
    Balazinska, Magdalena
    Howe, Bill
    Suciu, Dan
    JOURNAL OF THE ACM, 2015, 62 (05)
  • [33] Dynamic query-based debugging
    Lencevicius, R
    Hölzle, U
    Singh, AK
    ECOOP'99 - OBJECT-ORIENTED PROGRAMMING, 1999, 1628 : 135 - 160
  • [34] A query-based quantum eigensolver
    Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Sichuan, China
    不详
    不详
    不详
    Quantum Eng., 2020, 3
  • [35] Snapshot query-based debugging
    Potanin, A
    Noble, J
    Biddle, R
    2004 AUSTRALIAN SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 2004, : 251 - 259
  • [36] Information Extraction from Lengthy Legal Contracts: Leveraging Query-Based Summarization and GPT-3.5
    Zin, May Myo
    Ha Thanh Nguyen
    Satoh, Ken
    Sugawara, Saku
    Nishino, Fumihito
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 379 : 177 - 186
  • [37] D2S: Document-to-Slide Generation Via Query-Based Text Summarization
    Sun, Edward
    Hou, Yufang
    Wang, Dakuo
    Zhang, Yunfeng
    Wang, Nancy X. R.
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1405 - 1418
  • [38] Query-Based Access Control for Ontologies
    Knechtel, Martin
    Stuckenschmidt, Heiner
    WEB REASONING AND RULE SYSTEMS, 2010, 6333 : 73 - +
  • [39] Query-based learning of XPath expressions
    Carme, Julien
    Ceresna, Michal
    Goebel, Max
    GRAMMATICAL INFERENCE: ALGORITHMS AND APPLICATIONS, PROCEEDINGS, 2006, 4201 : 342 - 343
  • [40] Regularizing query-based retrieval scores
    Diaz, Fernando
    INFORMATION RETRIEVAL, 2007, 10 (06): : 531 - 562