An automatic keyphrase extraction system for scientific documents

被引:41
|
作者
You, Wei [1 ]
Fontaine, Dominique [1 ]
Barthes, Jean-Paul [1 ]
机构
[1] Univ Technologiede Compiegne, Ctr Rech Royallieu, HEUDIASYC UMR CNRS 6599, Compiegne, France
关键词
Information retrieval; Automatic indexing; Keyphrases extraction; Candidate phrase identification; Scientific document processing;
D O I
10.1007/s10115-012-0480-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic keyphrase extraction techniques play an important role for many tasks including indexing, categorizing, summarizing, and searching. In this paper, we develop and evaluate an automatic keyphrase extraction system for scientific documents. Compared with previous work, our system concentrates on two important issues: (1) more precise location for potential keyphrases: a new candidate phrase generation method is proposed based on the core word expansion algorithm, which can reduce the size of the candidate set by about 75% without increasing the computational complexity; (2) overlap elimination for the output list: when a phrase and its sub-phrases coexist as candidates, an inverse document frequency feature is introduced for selecting the proper granularity. Additional new features are added for phrase weighting. Experiments based on real-world datasets were carried out to evaluate the proposed system. The results show the efficiency and effectiveness of the refined candidate set and demonstrate that the new features improve the accuracy of the system. The overall performance of our system compares favorably with other state-of-the-art keyphrase extraction systems.
引用
收藏
页码:691 / 724
页数:34
相关论文
共 50 条
  • [1] An automatic keyphrase extraction system for scientific documents
    Wei You
    Dominique Fontaine
    Jean-Paul Barthès
    [J]. Knowledge and Information Systems, 2013, 34 : 691 - 724
  • [2] Automatic Keyphrase Extraction from Persian Scientific Documents Using Semantic Relations
    Farahani, Bahare Davoodabadi
    Fatemi, Seied Omid
    Ghorbani, Mohsen
    [J]. 2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 1972 - 1978
  • [3] Automatic Keyphrase Extraction from Medical Documents
    Sarkar, Kamal
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 273 - 278
  • [4] Automatic keyphrase extraction for Arabic news documents based on KEA system
    Duwairi, Rehab
    Hedaya, Mona
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 30 (04) : 2101 - 2110
  • [5] Automatic keyphrase extraction from scientific articles
    Su Nam Kim
    Olena Medelyan
    Min-Yen Kan
    Timothy Baldwin
    [J]. Language Resources and Evaluation, 2013, 47 : 723 - 742
  • [6] Automatic keyphrase extraction from scientific articles
    Kim, Su Nam
    Medelyan, Olena
    Kan, Min-Yen
    Baldwin, Timothy
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) : 723 - 742
  • [7] Automatic keyphrase extraction from chinese news documents
    Wang, HF
    Li, SJ
    Yu, SW
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 648 - 657
  • [8] Automatic Keyphrase Extraction from Scientific Documents Using N-gram Filtration Technique
    Kumar, Niraj
    Srinathan, Kannan
    [J]. DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 199 - 208
  • [9] Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms
    Joorabchi, Arash
    Mahdi, Abdulhussain E.
    [J]. JOURNAL OF INFORMATION SCIENCE, 2013, 39 (03) : 410 - 426
  • [10] A Study on Automatic Keyphrase Extraction and Its Refinement for Scientific Articles
    Lim, Yeonsoo
    Bong, Daehyeon
    Jung, Yuchul
    [J]. CURRENT TRENDS IN WEB ENGINEERING, ICWE 2019 INTERNATIONAL WORKSHOPS, 2020, 11609 : 18 - 21