Information Extraction of Domain-specific Business Documents with Limited Data

被引:2
|
作者
Minh-Tien Nguyen [1 ,2 ]
Le Thai Linh [1 ]
Dung Tien Le [1 ]
Nguyen Hong Son [1 ]
Do Hoang Thai Duong [1 ]
Bui Cong Minh [1 ]
Akira Shojiguchi [1 ]
机构
[1] CINNAMON LAB, 10th Floor,Geleximco Bldg,36 Hoang Cau, Hanoi, Vietnam
[2] Hung Yen Univ Technol & Educ, Hung Yen, Vietnam
关键词
Information extraction; Document analysis;
D O I
10.1109/IJCNN52387.2021.9534328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual application to business cases, there is a big deadlock to adapt common extraction systems to domain-specific documents due to the limitation of preparation of training data. To overcome this issue, we introduce a model, which employs pre-trained language models with a customized CNN layer for domain adaptation. The model is validated on three Japanese domain-specific and two benchmark machine reading comprehension data sets (SQuADs). Experimental results confirm that our model achieves promising results which are applicable for actual business scenarios.
引用
收藏
页数:8
相关论文
共 50 条
  • [11] Extracting Web Business Information Based on Domain-Specific Ontology
    Shen, J.
    Bi, L.
    Xu, F. Y.
    He, K.
    Wei, L. H.
    Zhu, Y.
    [J]. ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 1, 2008, : 997 - 1003
  • [12] Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale
    Rheinlaender, Astrid
    Lehmann, Mario
    Kunkel, Anja
    Meier, Joerg
    Leser, Ulf
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 759 - 771
  • [13] Domain-specific keyphrase extraction
    Frank, E
    Paynter, GW
    Witten, IH
    Gutwin, C
    Nevill-Manning, CG
    [J]. IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2, 1999, : 668 - 673
  • [14] Domain-Specific Paraphrase Extraction
    Pavlick, Ellie
    Ganitkevitch, Juri
    Chan, Tsz Ping
    Yao, Xuchen
    Van Durme, Benjamin
    Callison-Burch, Chris
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2, 2015, : 57 - 62
  • [15] Generating Domain-Specific Interactive Validation Documents
    Vu, Fabian
    Happe, Christopher
    Leuschel, Michael
    [J]. FORMAL METHODS FOR INDUSTRIAL CRITICAL SYSTEMS (FMICS 2022), 2022, 13487 : 32 - 49
  • [16] Generating Domain-Specific Interactive Validation Documents
    Vu, Fabian
    Happe, Christopher
    Leuschel, Michael
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2022, 13487 LNCS : 32 - 49
  • [17] Scalable Document Image Information Extraction with Application to Domain-Specific Analysis
    Zheng, Yingbin
    Kong, Shuchen
    Zhu, Wanshan
    Ye, Hao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5108 - 5115
  • [18] Domain-Specific Business Modeling with the Business Model Developer
    Bosselmann, Steve
    Margaria, Tiziana
    [J]. LEVERAGING APPLICATIONS OF FORMAL METHODS, VERIFICATION AND VALIDATION: SPECIALIZED TECHNIQUES AND APPLICATIONS, PT II, 2014, 8803 : 545 - 560
  • [19] A concept-based information retrieval approach for engineering domain-specific technical documents
    Lin, Hsien-Tang
    Chi, Nai-Wen
    Hsieh, Shang-Hsien
    [J]. ADVANCED ENGINEERING INFORMATICS, 2012, 26 (02) : 349 - 360
  • [20] Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory
    Chung, Wingyan
    Lai, Gump
    Bonillas, Alfonso
    Xi, Wei
    Chen, Hsinchun
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2008, 66 (02) : 51 - 66