Rule-based information extraction for mechanical-electrical-plumbing-specific semantic web

被引:30
|
作者
Wu, Lang-Tao [1 ]
Lin, Jia-Rui [1 ]
Leng, Shuo [1 ]
Li, Jiu-Lin [2 ]
Hu, Zhen-Zhong [3 ]
机构
[1] Tsinghua Univ, Dept Civil Engn, Beijing 100084, Peoples R China
[2] Beijing Urban Construction Grp Co Ltd, Beijing 100088, Peoples R China
[3] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Information extraction; MEP; Rule match; Named entity recognition; Relation extraction; Natural language understanding; Semantic web; MANAGEMENT; KNOWLEDGE; ONTOLOGY; OBJECTS;
D O I
10.1016/j.autcon.2021.104108
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Information extraction (IE), which aims to retrieve meaningful information from plain text, has been widely studied in general and professional domains to support downstream applications. However, due to the lack of labeled data and the complexity of professional mechanical, electrical and plumbing (MEP) information, it is challenging to apply current common deep learning IE methods to the MEP domain. To solve this problem, this paper proposes a rule-based approach for MEP IE task, including a "snowball " strategy to collect large-scale MEP corpora, a suffix-based matching algorithm on text segments for named entity recognition (NER), and a dependency-path-based matching algorithm on dependency tree for relationship extraction (RE). 2 ideas called "meta linking " and "path filtering " for RE are proposed as well, to discover the out-of-pattern entities/relationships as many as possible. To verify the feasibility of the proposed approach, 65 MB MEP corpora have been collected as input of the proposed approach and an MEP semantic web which consists of 15,978 entities and 65,110 relationship triples established, with an accuracy of 81% to entities and 75% to relationship triples, respectively. A comparison experiment between classical deep learning models and the proposed rule-based approach was carried out, illustrating that the performance of our method is 37% and 49% better than the selected deep learning NER and RE models, respectively, in the aspect of extraction precision.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Uncertainty Management in Rule-Based Information Extraction Systems
    Michelakis, Eirinaios
    Krishnamurthy, Rajasekar
    Haas, Peter J.
    Vaithyanathan, Shivakumar
    ACM SIGMOD/PODS 2009 CONFERENCE, 2009, : 101 - 114
  • [22] Rule-Based Attractions Describe Paragraph Information Extraction
    Feng, Xiaolan
    Zhao, Xiaobing
    2018 INTERNATIONAL CONFERENCE ON ROBOTS & INTELLIGENT SYSTEM (ICRIS 2018), 2018, : 387 - 390
  • [23] Odinson: A Fast Rule-based Information Extraction Framework
    Valenzuela-Escarcega, Marco A.
    Hahn-Powell, Gus
    Bell, Dane
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2183 - 2191
  • [24] VALET: Rule-Based Information Extraction for Rapid Deployment
    Freitag, Dayne
    Cadigan, John
    Sasseen, Robert
    Kalmar, Paul
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 524 - 533
  • [25] Detection of semantic conflicts in ontology and rule-based information systems
    Alcaraz Calero, Jose M.
    Marin Perez, Juan M.
    Bernal Bernabe, Jorge
    Garcia Clemente, Felix J.
    Martinez Perez, Gregorio
    Gomez Skarmeta, Antonio F.
    DATA & KNOWLEDGE ENGINEERING, 2010, 69 (11) : 1117 - 1137
  • [26] A rule-based knowledge system on semantic web for collaboration moderator services
    Lin, H. K.
    Harding, J. A.
    Tsai, W. C.
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2012, 50 (03) : 805 - 816
  • [27] Proposal of a New Rule-Based Inference Scheme for the Semantic Web Applications
    Nalepa, Grzegorz J.
    Furmanska, Weronika T.
    NEW CHALLENGES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, 2009, 244 : 15 - 26
  • [28] Towards knowledge extraction from weblogs and rule-based semantic querying
    Bai, Xi
    Sun, Jigui
    Che, Haiyan
    Wang, Jin
    ADVANCES IN RULE INTERCHANGE AND APPLICATIONS, PROCEEDINGS, 2007, 4824 : 215 - +
  • [29] Rule-Based Agricultural Knowledge Fusion in Web Information Integration
    Xie Nengfu
    Wang Wensheng
    Yang Xiaorong
    Jiang Lihua
    SENSOR LETTERS, 2012, 10 (1-2) : 635 - 638
  • [30] Comparing State of the Art Rule-Based Tools for Information Extraction
    Lembo, Domenico
    Scafoglieri, Federico Maria
    RULES AND REASONING, RULEML+RR 2023, 2023, 14244 : 157 - 165