Unsupervised Topic-Oriented Keyphrase Extraction and Its Application to Croatian

被引:0
|
作者
Saratlija, Josip [1 ]
Snajder, Jan [1 ]
Basic, Bojana Dalbelo [1 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Zagreb 41000, Croatia
来源
关键词
Information extraction; keyphrase extraction; unsupervised learning; k-means; Croatian language;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Labeling documents with keyphrases is a tedious and expensive task. Most approaches to automatic keyphrases extraction rely on supervised learning and require manually labeled training data. In this paper we propose a fully unsupervised keyphrase extraction method, differing from the usual generic keyphrase extractor in the manner the keyphrases are formed. Our method begins by building topically related word clusters from which document keywords are selected, and then expands the selected keywords into syntactically valid keyphrases. We evaluate our approach on a Croatian document collection annotated by eight human experts, taking into account the high subjectivity of the keyphrase extraction task. The performance of the proposed method reaches up to F1 = 44.5%, which is outperformed by human annotators, but comparable to a supervised approach.
引用
收藏
页码:340 / 347
页数:8
相关论文
共 50 条
  • [21] Detecting Topic-Oriented Speaker Stance in Conversational Speech
    Lai, Catherine
    Alex, Beatrice
    Moore, Johanna D.
    Tian, Leimin
    Hori, Tatsuro
    Francesca, Gianpiero
    [J]. INTERSPEECH 2019, 2019, : 46 - 50
  • [22] ZoomNet for Topic-Oriented Fragment Recognition in Long Documents
    Yan, Yukun
    Zheng, Daqi
    Lu, Zhengdong
    Song, Sen
    [J]. IEEE ACCESS, 2022, 10 : 39545 - 39554
  • [23] PromptRank: Unsupervised Keyphrase Extraction Using Prompt
    Kong, Aobo
    Zhao, Shiwan
    Chen, Hao
    Li, Qicheng
    Qin, Yong
    Sun, Ruiqi
    Bai, Xiaoyan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9788 - 9801
  • [24] TOP-Rank: A Novel Unsupervised Approach for Topic Prediction Using Keyphrase Extraction for Urdu Documents
    Amin, Ahmad
    Rana, Toqir A.
    Mian, Natash Ali
    Iqbal, Muhammad Waseem
    Khalid, Abbas
    Alyas, Tahir
    Tubishat, Mohammad
    [J]. IEEE ACCESS, 2020, 8 (08): : 212675 - 212686
  • [25] NamedKeys: Unsupervised Keyphrase Extraction for Biomedical Documents
    Gero, Zelalem
    Ho, Joyce C.
    [J]. ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 328 - 337
  • [26] How Preprocessing Affects Unsupervised Keyphrase Extraction
    Wang, Rui
    Liu, Wei
    McDonald, Chris
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PT I, 2014, 8403 : 163 - 176
  • [27] But the dictionary says. Topic-oriented translation teaching
    Davies, MG
    [J]. LANGUAGES IN THE EUROPEAN COMMUNITY II - TRANSLATION, 1996, (20): : 101 - 113
  • [28] Topic-Oriented Controlled Text Generation for Social Networks
    Yang, Zhian
    Jiang, Hao
    Deng, Aobo
    Li, Yang
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2024, 96 (02): : 131 - 151
  • [29] Audiovisual anchorperson detection for topic-oriented navigation in broadcast news
    Haller, Martin
    Kim, Hyoamg-Gook
    Sikora, Thomas
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1817 - 1820
  • [30] When Topic Models Disagree: Keyphrase Extraction with Multiple Topic Models
    Sterckx, Lucas
    Demeester, Thomas
    Deleu, Johannes
    Develder, Chris
    [J]. WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 123 - 124