A simple and fast method for Named Entity context extraction from patents

被引:15
|
作者
Puccetti, Giovanni [1 ]
Chiarello, Filippo [2 ]
Fantoni, Gualtiero [3 ]
机构
[1] Scuola Normale Super Pisa, Piazza Cavalieri 7, I-56126 Pisa, Italy
[2] Dept Energy Syst Terr & Construct Engn, Largo Lucio Lazzarino 2, I-56122 Pisa, Italy
[3] Dept Civil & Ind Engn, Largo Lucio Lazzarino 2, I-56122 Pisa, Italy
关键词
Natural Language Processing; Information retrieval; Patents; FRAMEWORK;
D O I
10.1016/j.eswa.2021.115570
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The process of extracting relevant technical information from patents or technical literature is as valuable as it is challenging. It deals with highly relevant information extraction from a corpus of documents with particular structure, and a mix of technical and legal jargon. Patents are the wider free source of technical information where homogeneous entities can be found. From a technical perspective the approaches refer to Named Entity Recognition (NER) and make use of Machine Learning techniques for Natural Language Processing (NLP). However, due to the large amount of data, to the complexity of the lexicon, the peculiarity of the structure and the scarcity of the examples to be used to feed the machine learning system, new approaches should be studied. NER methods are increasing their performances in many contexts, but a gap still exists when dealing with technical documentation. The aim of this work is to create an automatic training sets for NER systems by exploiting the nature and structure of patents, an open and massive source of technical documentation. In particular, we focus on collecting the context where users of the invention appear within patents. We then measure to which extent we achieve our goal and discuss how much our method is generalizable to other entities and documents.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A simple but effective span-level tagging method for discontinuous named entity recognition
    Zhejiang Dahua Technology Co., Ltd., Hangzhou
    310053, China
    不详
    310027, China
    不详
    200237, China
    Neural Comput. Appl., 13 (7187-7201):
  • [22] A simple but effective span-level tagging method for discontinuous named entity recognition
    Tingyun Mao
    Yaobin Xu
    Weitang Liu
    Jingchao Peng
    Lili Chen
    Mingwei Zhou
    Neural Computing and Applications, 2024, 36 : 7187 - 7201
  • [23] The Role of Global and Local Context in Named Entity Recognition
    Amalvy, Arthur
    Labatut, Vincent
    Dufour, Richard
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 714 - 722
  • [24] Chinese Named Entity Recognition with Inducted Context Patterns
    Pang, Wenbo
    Fan, Xiaozhong
    2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 3, PROCEEDINGS, 2009, : 608 - 611
  • [25] Context Hidden Markov Model for Named Entity Recognition
    Todorovic, Branimir T.
    Rancic, Svetozar R.
    Mulalic, Edin H.
    APPROXIMATION AND COMPUTATION: IN HONOR OF GRADIMIR V. MILOVANOVIC, 2011, 42 : 447 - +
  • [26] A Simple but Useful Multi-corpus Transferring Method for Biomedical Named Entity Recognition
    Li, Jiqiao
    Yuan, Chi
    Li, Zirui
    Wang, Huaiyu
    Tao, Feifei
    HEALTH INFORMATION PROCESSING, CHIP 2023, 2023, 1993 : 66 - 81
  • [27] Learning In-context Learning for Named Entity Recognition
    Chen, Jiawei
    Lu, Yaojie
    Lin, Hongyu
    Lou, Jie
    Jia, Wei
    Dai, Dai
    Wu, Hua
    Cao, Boxi
    Han, Xianpei
    Sun, Le
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13661 - 13675
  • [28] Robustness of Named Entity Replacements for In-Context Learning
    Goodarzi, Saeed
    Kagita, Nikhil
    Minn, Dennis
    Wang, Shufan
    Dessi, Roberto
    Toshniwal, Shubham
    Williams, Adina
    Lanchantin, Jack
    Sinha, Koustuv
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10914 - 10931
  • [29] COMCARE: A Collaborative Ensemble Framework for Context-Aware Medical Named Entity Recognition and Relation Extraction
    Jin, Myeong
    Choi, Sang-Min
    Kim, Gun-Woo
    ELECTRONICS, 2025, 14 (02):
  • [30] Extraction and Visualization of Numerical and Named Entity Information from a Large Number of Documents
    Murata, Masaki
    Iwatate, Masakazu
    Ichii, Koji
    Ma, Qing
    Shirado, Tamotsu
    Kanamaru, Toshiyuki
    Torisawa, Kentaro
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 122 - +