Information Extraction of Domain-specific Business Documents with Limited Data

被引:2
|
作者
Minh-Tien Nguyen [1 ,2 ]
Le Thai Linh [1 ]
Dung Tien Le [1 ]
Nguyen Hong Son [1 ]
Do Hoang Thai Duong [1 ]
Bui Cong Minh [1 ]
Akira Shojiguchi [1 ]
机构
[1] CINNAMON LAB, 10th Floor,Geleximco Bldg,36 Hoang Cau, Hanoi, Vietnam
[2] Hung Yen Univ Technol & Educ, Hung Yen, Vietnam
关键词
Information extraction; Document analysis;
D O I
10.1109/IJCNN52387.2021.9534328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information extraction is a key corner-stone in the digitization of office data which requires the conversion of unstructured to structured data. However, in the actual application to business cases, there is a big deadlock to adapt common extraction systems to domain-specific documents due to the limitation of preparation of training data. To overcome this issue, we introduce a model, which employs pre-trained language models with a customized CNN layer for domain adaptation. The model is validated on three Japanese domain-specific and two benchmark machine reading comprehension data sets (SQuADs). Experimental results confirm that our model achieves promising results which are applicable for actual business scenarios.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] AURORA: An Information Extraction System of Domain-specific Business Documents with Limited Data
    Minh-Tien Nguyen
    Dung Tien Le
    Le Thai Linh
    Nguyen Hong Son
    Do Hoang Thai Duong
    Bui Cong Minh
    Nguyen Hai Phong
    Nguyen Huu Hiep
    [J]. CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3437 - 3440
  • [2] Transformers-based information extraction with limited data for domain-specific business documents
    Nguyen, Minh-Tien
    Le, Dung Tien
    Le, Linh
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 97
  • [3] Relation Identification in Business Rules for Domain-specific Documents
    Bhattacharyya, Abhidip
    Chittimalli, Pavan Kumar
    Naik, Ravindra
    [J]. ISEC'18: PROCEEDINGS OF THE 11TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE, 2018,
  • [4] Domain-specific information extraction structures
    Lyons, S
    Smith, D
    [J]. 13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 80 - 84
  • [5] Extraction of Informative Expressions from Domain-specific Documents
    Yamamoto, Eiko
    Isahara, Hitoshi
    Terada, Akira
    Abe, Yasunori
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1611 - 1617
  • [6] Prioritization of Domain-Specific Web Information Extraction
    Huang, Jian
    Yu, Cong
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1327 - 1333
  • [7] Term extraction from sparse, ungrammatical domain-specific documents
    Ittoo, Ashwin
    Bouma, Gosse
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (07) : 2530 - 2540
  • [8] Automatic extraction of domain-specific stopwords from labeled documents
    Makrehchi, Masoud
    Kamel, Mohamed S.
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 222 - 233
  • [9] An Approach to Mine Business Rule Intents from Domain-specific Documents
    Bhattacharyya, Abhidip
    Chittimalli, Pavan Kumar
    Naik, Ravindra
    [J]. PROCEEDINGS OF THE 10TH INNOVATIONS IN SOFTWARE ENGINEERING CONFERENCE, 2017, : 96 - 106
  • [10] Adapting Open Information Extraction to Domain-Specific Relations
    Soderland, Stephen
    Roof, Brendan
    Qin, Bo
    Xu, Shi
    Mausam
    Etzioni, Oren
    [J]. AI MAGAZINE, 2010, 31 (03) : 93 - 102