Extracting Definitions from Brazilian Legal Texts

被引:0
|
作者
Ferneda, Edilson [1 ]
do Prado, Hercules Antonio [1 ,2 ]
Batista, Augusto Herrmann [1 ,3 ]
Pinheiro, Marcello Sandi [4 ]
机构
[1] Univ Catolica Brasilia, Grad Program Knowledge & IT Management, SGAN 916 Av W5, BR-70790160 Brasilia, DF, Brazil
[2] Embrapa Management & Strategy Secretariat, BR-7077090 Brasilia, DF, Brazil
[3] Minist Planning Budget & Management, Logist & Informat Technol Secretariat, BR-70046900 Brasilia, DF, Brazil
[4] Univ Fed Rio de Janeiro, COPPE, BR-2194197 Rio De Janeiro, RJ, Brazil
关键词
Information extraction; Definition extraction; Natural Language Processing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In order to avoid ambiguity and to ensure, as far as possible, a strict interpretation of law, legal texts usually define the specific lexical terms used within their discourse by means of normative rules. With an often large amount of rules in effect in a given domain, extracting these definitions manually would be a costly undertaking. This paper presents an approach to cope with this problem based in a variation of an automated technique of natural language processing of Brazilian Portuguese texts. For the sake of generality, the proposed solution was developed to address the more general problem of building a glossary from domain specific texts that contain definitions amongst their content. This solution was applied to a corpus of texts on the telecommunications regulations domain and the results are reported. The usual pipeline of natural language processing has been followed: preprocessing, segmentation, and part-of-speech tagging. A set of feature extraction functions is specified and used along with reference glossary information on whether or not a text fragment is a definition, to train a SVM classifier. At last, the definitions are extracted from the texts and evaluated upon a testing corpus, which also contains the reference glossary annotations on definitions. The results are then discussed in light of other definition extraction techniques.
引用
收藏
页码:631 / 646
页数:16
相关论文
共 50 条
  • [31] Keyword extraction from Arabic legal texts
    Rammal, Mahmoud
    Bahsoun, Zeinab
    Jabbour, Mona Al Achkar
    INTERACTIVE TECHNOLOGY AND SMART EDUCATION, 2015, 12 (01) : 62 - 71
  • [32] Legal and administrative texts from the reign of Nahonidus
    Dandamayev, MA
    JOURNAL OF THE AMERICAN ORIENTAL SOCIETY, 2001, 121 (04) : 700 - 702
  • [33] A NOTE ON LEGAL DEFINITIONS
    Cairns, Huntington
    COLUMBIA LAW REVIEW, 1936, 36 (07) : 1099 - 1106
  • [34] An automated framework for the extraction of semantic legal metadata from legal texts
    Sleimi, Amin
    Sannier, Nicolas
    Sabetzadeh, Mehrdad
    Briand, Lionel
    Ceci, Marcello
    Dann, John
    EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (03)
  • [35] An automated framework for the extraction of semantic legal metadata from legal texts
    Amin Sleimi
    Nicolas Sannier
    Mehrdad Sabetzadeh
    Lionel Briand
    Marcello Ceci
    John Dann
    Empirical Software Engineering, 2021, 26
  • [36] Extracting section structure from resumes in Brazilian Portuguese
    Werner, Matheus
    Laber, Eduardo
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [37] Extracting Planning Operators from Instructional Texts for Behaviour Interpretation
    Yordanova, Kristina
    KI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11117 : 215 - 228
  • [38] Extracting drug-drug interactions from biomedical texts
    Segura-Bedmar, Isabel
    Martinez, Paloma
    de Pablo-Sanchez, Cesar
    BMC BIOINFORMATICS, 2010, 11
  • [39] A BOOTSTRAPPING METHOD FOR EXTRACTING PARAPHRASES OF EMOTION EXPRESSIONS FROM TEXTS
    Keshtkar, Fazel
    Inkpen, Diana
    COMPUTATIONAL INTELLIGENCE, 2013, 29 (03) : 417 - 435
  • [40] Combined Classification for Extracting Named Entities from Arabic Texts
    Trabelsi, Feriel Ben Fraj
    Zribi, Chiraz Ben Othmane
    Kouki, Wiem
    2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 55 - 60