A simple and fast method for Named Entity context extraction from patents

被引:15
|
作者
Puccetti, Giovanni [1 ]
Chiarello, Filippo [2 ]
Fantoni, Gualtiero [3 ]
机构
[1] Scuola Normale Super Pisa, Piazza Cavalieri 7, I-56126 Pisa, Italy
[2] Dept Energy Syst Terr & Construct Engn, Largo Lucio Lazzarino 2, I-56122 Pisa, Italy
[3] Dept Civil & Ind Engn, Largo Lucio Lazzarino 2, I-56122 Pisa, Italy
关键词
Natural Language Processing; Information retrieval; Patents; FRAMEWORK;
D O I
10.1016/j.eswa.2021.115570
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The process of extracting relevant technical information from patents or technical literature is as valuable as it is challenging. It deals with highly relevant information extraction from a corpus of documents with particular structure, and a mix of technical and legal jargon. Patents are the wider free source of technical information where homogeneous entities can be found. From a technical perspective the approaches refer to Named Entity Recognition (NER) and make use of Machine Learning techniques for Natural Language Processing (NLP). However, due to the large amount of data, to the complexity of the lexicon, the peculiarity of the structure and the scarcity of the examples to be used to feed the machine learning system, new approaches should be studied. NER methods are increasing their performances in many contexts, but a gap still exists when dealing with technical documentation. The aim of this work is to create an automatic training sets for NER systems by exploiting the nature and structure of patents, an open and massive source of technical documentation. In particular, we focus on collecting the context where users of the invention appear within patents. We then measure to which extent we achieve our goal and discuss how much our method is generalizable to other entities and documents.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Named Entity Extraction in a Military Context
    Kok, Arvid
    Mestric, Ivana Ilic
    Street, Michael
    2019 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS (ICMCIS), 2019,
  • [2] A named entity relation extraction method based on bootstrapping
    He Tingting
    Xu Chao
    Li Jing
    Zhao Junzhe
    2005 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND TECHNOLOGY, PROCEEDINGS, 2005, : 758 - 763
  • [3] Improved named entity translation and bilingual named entity extraction
    Huang, F
    Vogel, S
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 253 - 258
  • [4] Multi-level context features extraction for named entity recognition
    Chang, Jun
    Han, Xiaohong
    COMPUTER SPEECH AND LANGUAGE, 2023, 77
  • [5] A Resource-Based Method for Named Entity Extraction and Classification
    Gamallo, Pablo
    Garcia, Marcos
    PROGRESS IN ARTIFICIAL INTELLIGENCE-BOOK, 2011, 7026 : 610 - 623
  • [6] A Named Entity and Relationship Extraction Method from Trouble-Shooting Documents in Korean
    Jeong, Minkyu
    Suh, Hyowon
    Lee, Heejung
    Lee, Jae Hyun
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [7] Context Aware Named Entity Disambiguation
    Lasek, Ivo
    Vojtas, Peter
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 402 - 408
  • [8] An Approach to Named Entity Extraction from Mongolian Historical Documents
    Batjargal, Biligsaikhan
    Khaltarkhuu, Garmaabazar
    Maeda, Akira
    2015 INTERNATIONAL CONFERENCE ON CULTURE AND COMPUTING (CULTURE COMPUTING), 2015, : 205 - 206
  • [9] Named entity extraction from noisy input: Speech and OCR
    Miller, D
    Boisen, S
    Schwartz, R
    Stone, R
    Weischedel, R
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 316 - 324
  • [10] Named entity matching method based on the context-free morphological generator
    Kocoń, Jan
    Piasecki, Maciej
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8686 : 34 - 44