Domain-Specific Hierarchical Text Classification for Supporting Automated Environmental Compliance Checking

被引:37
|
作者
Zhou, Peng [1 ]
El-Gohary, Nora [1 ]
机构
[1] Univ Illinois, Dept Civil & Environm Engn, 205 N Mathews Ave, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
Automated compliance checking; Semantic systems; Automated construction management systems; Natural language processing; Text classification; Machine learning; ONTOLOGY;
D O I
10.1061/(ASCE)CP.1943-5487.0000513
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Automated environmental compliance checking requires automated extraction of rules from environmental regulatory textual documents such as energy conservation codes and EPA regulations. Automated rule extraction requires complex text processing and analysis for information extraction and subsequent formalization of the extracted information into computer-processable rules. In the proposed automated compliance checking (ACC) approach, the text is first classified into predefined categories before information extraction (IE). The advantages are that irrelevant text will be filtered out during text classification (TC) and text with similar semantic meaning will be grouped, thereby improving the efficiency and accuracy of further IE and compliance reasoning (CR). The categories used for TC are predefined in a semantic TC topic hierarchy, and the classified text is subsequently used in semantic IE and semantic CR. This paper presents the proposed machine learning (ML)-based TC algorithm for classifying clauses in environmental regulatory documents based on the TC topic hierarchy. In developing the algorithm, different text preprocessing techniques, ML algorithms, and performance improvement strategies were tested and used. The final TC algorithm was tested on 10 environmental regulatory documents and evaluated in terms of precision and recall. The algorithm achieved approximately 97 and 84% average recall and precision, respectively, on the testing data.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Semantic Text Classification for Supporting Automated Compliance Checking in Construction
    Salama, Dareen M.
    El-Gohary, Nora M.
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2016, 30 (01)
  • [2] Domain-Specific Program Checking
    Renggli, Lukas
    Ducasse, Stephane
    Girba, Tudor
    Nierstrasz, Oscar
    [J]. OBJECTS, MODELS, COMPONENTS, PATTERNS, 2010, 6141 : 213 - +
  • [3] IDENTIFYING DOMAIN-SPECIFIC SENSES AND ITS APPLICATION TO TEXT CLASSIFICATION
    Fukumoto, Fumiyo
    Suzuki, Yoshimi
    [J]. KEOD 2010: Proceedings of the International Conference on Knowledge Engineering and Ontology Development, 2010, : 263 - 268
  • [4] Text classification based filters for a domain-specific search engine
    Schmidt, Sebastian
    Schnitzer, Steffen
    Rensing, Christoph
    [J]. COMPUTERS IN INDUSTRY, 2016, 78 : 70 - 79
  • [5] Domain-specific term extraction and its application in text classification
    Liu, T
    Wang, XL
    Yi, G
    Xu, ZM
    Wang, Q
    [J]. Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1481 - 1484
  • [6] Domain-specific text dictionaries for text analytics
    Andrea Villanes
    Christopher G. Healey
    [J]. International Journal of Data Science and Analytics, 2023, 15 : 105 - 118
  • [7] Domain-specific text dictionaries for text analytics
    Villanes, Andrea
    Healey, Christopher G.
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 15 (01) : 105 - 118
  • [8] Domain-Specific Long Text Classification from Sparse Relevant Information
    D'Cruz, Célia
    Bereder, Jean-Marc
    Precioso, Frédéric
    Riveill, Michel
    [J]. Frontiers in Artificial Intelligence and Applications, 392 : 4003 - 4010
  • [9] Imbalanced text sentiment classification using universal and domain-specific knowledge
    Li, Yijing
    Guo, Haixiang
    Zhang, Qingpeng
    Gu, Mingyun
    Yang, Jianying
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 160 : 1 - 15
  • [10] Automatic domain-specific term extraction and its application in text classification
    Liu, Tao
    Liu, Bing-Quan
    Xu, Zhi-Ming
    Wang, Xiao-Long
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2007, 35 (02): : 328 - 332