Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

被引：0

作者：

Pires, Ramon ^{[1
,2
]}

de Souza, Fabio C. ^{[1
,3
]}

Rosa, Guilherme ^{[1
,3
]}

Lotufo, Roberto A. ^{[1
,3
]}

Nogueira, Rodrigo ^{[1
,3
]}

机构：

[1] NeuralMind Inteligencia Artificial, Sao Paulo, SP, Brazil

[2] Univ Estadual Campinas, Inst Comp, Campinas, SP, Brazil

[3] Univ Estadual Campinas, Sch Elect & Comp Engn, Campinas, SP, Brazil

来源：

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 | 2022年 / 13237卷

关键词：

Information extraction; Sequence-to-sequence; Legal texts;

D O I：

10.1007/978-3-031-06555-2_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts. In a production pipeline, requirements often change, with classes being added and removed, which leads to nontrivial modifications to the source code and the possible introduction of bugs. In this work, we evaluate sequence-to-sequence models as an alternative to token-level classification methods for information extraction of legal and registration documents. We finetune models that jointly extract the information and generate the output already in a structured format. Post-processing steps are learned during training, thus eliminating the need for rule-based methods and simplifying the pipeline. Furthermore, we propose a novel method to align the output with the input text, thus facilitating system inspection and auditing. Our experiments on four real-world datasets show that the proposed method is an alternative to classical pipelines. The source code is available at https://github.com/neuralmind-ai/information-extraction-t5.

引用

页码：83 / 95

页数：13

共 50 条

[21] SUPERVISED ATTENTION IN SEQUENCE-TO-SEQUENCE MODELS FOR SPEECH RECOGNITION
Yang, Gene-Ping
Tang, Hao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7222 - 7226
[22] Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
Panchbhai, Anand
Soru, Tommaso
Marx, Edgard
KNOWLEDGE GRAPHS AND SEMANTIC WEB, KGSWC 2020, 2020, 1232 : 158 - 165
[23] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
Shi, Tian
Keneshloo, Yaser
Ramakrishnan, Naren
Reddy, Chandan K.
ACM/IMS Transactions on Data Science, 2021, 2 (01):
[24] Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models
Parry, Andrew
Froebe, Maik
MacAvaney, Sean
Potthast, Martin
Hagen, Matthias
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 286 - 302
[25] Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models
Huangfu, Yourui
Wang, Jian
Li, Rong
Xu, Chen
Wang, Xianbin
Zhang, Huazi
Wang, Jun
2019 IEEE 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2019, : 1043 - 1049
[26] ACOUSTIC-TO-WORD RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS
Palaskar, Shruti
Metze, Florian
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 397 - 404
[27] Persian Keyphrase Generation Using Sequence-to-sequence Models
Doostmohammadi, Ehsan
Bokaei, Mohammad Hadi
Sameti, Hossein
2019 27TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2019), 2019, : 2010 - 2015
[28] Sequence-to-Sequence Models Can Directly Translate Foreign Speech
Weiss, Ron J.
Chorowski, Jan
Jaitly, Navdeep
Wu, Yonghui
Chen, Zhifeng
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2625 - 2629
[29] Sequence-to-Sequence Models and Their Evaluation for Spoken Language Normalization of Slovenian
Sepesy Maučec, Mirjam
Verdonik, Darinka
Donaj, Gregor
Applied Sciences (Switzerland), 2024, 14 (20):
[30] Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing
He, Han
Choi, Jinho D.
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 582 - 599

← 1 2 3 4 5 →