Information Extraction from Lengthy Legal Contracts: Leveraging Query-Based Summarization and GPT-3.5

被引:0
|
作者
Zin, May Myo [1 ]
Ha Thanh Nguyen [1 ]
Satoh, Ken [1 ]
Sugawara, Saku [1 ]
Nishino, Fumihito [1 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
来源
关键词
Information extraction; text summarization; lengthy legal contracts; zero-resource; large language models; unsupervised approach;
D O I
10.3233/FAIA230963
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the legal domain, extracting information from contracts poses significant challenges, primarily due to the scarcity of annotated data. In such situations, leveraging large language models (LLMs), such as the Generative Pretrained Transformer (GPT) models, offers a promising solution. However, the inherent token limitations of these models can be a bottleneck for processing lengthy legal contracts. This paper presents an unsupervised two-step approach to address these challenges. First, we propose a query-based summarization model that extracts sentences pertinent to predefined queries, concisely representing lengthy contracts. This summarization ensures that the core information remains intact while simultaneously addressing the token limitation issue. Subsequently, the generated summary is fed to GPT-3.5 for precise information extraction. Our approach effectively overcomes the challenges of token limitations and zero resources, enabling efficient and scalable information extraction from legal contracts. We compare our results with those obtained from supervised models that have been fine-tuned on domain-specific annotated data. Experimental results demonstrate the remarkable effectiveness of our approach, as it achieves state-of-the-art performance without the need for domain-specific training data.
引用
收藏
页码:177 / 186
页数:10
相关论文
共 8 条
  • [1] Intertopic Information Mining for Query-Based Summarization
    Ouyang, You
    Li, Wenjie
    Li, Sujian
    Lu, Qin
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (05): : 1062 - 1072
  • [2] A query-based medical information summarization system using ontology knowledge
    Chen, Ping
    Verma, Rakesh
    19TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2006, : 37 - +
  • [3] A Query-based Summarization Service from Multiple News Sources
    ShafieiBavani, Elaheh
    Ebrahimi, Mohammad
    Wong, Raymond
    Chen, Fang
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2016), 2016, : 42 - 49
  • [4] Query-Based Automatic Multi-document Summarization Extraction Method for Web Pages
    He, Qi
    Hao, Hong-Wei
    Yin, Xu-Cheng
    PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 1, 2012, 144 : 107 - 112
  • [5] Evaluating the OpenAI's GPT-3.5 Turbo's performance in extracting information from scientific articles on diabetic retinopathy
    Gue, Celeste Ci Ying
    Rahim, Noorul Dharajath Abdul
    Rojas-Carabali, William
    Agrawal, Rupesh
    Palvannan, R. K.
    Abisheganaden, John
    Yip, Wan Fen
    SYSTEMATIC REVIEWS, 2024, 13 (01)
  • [6] QBSUM: A large-scale query-based document summarization dataset from real-world applications
    Zhao, Mingjun
    Yan, Shengli
    Liu, Bang
    Zhong, Xinwang
    Hao, Qian
    Chen, Haolan
    Niu, Di
    Long, Bowei
    Guo, Weidong
    COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [7] A Query-Based Network for Rural Homestead Extraction from VHR Remote Sensing Images
    Wei, Ren
    Fan, Beilei
    Wang, Yuting
    Yang, Rongchao
    SENSORS, 2023, 23 (07)
  • [8] NLP-Based Query-Answering System for Information Extraction from Building Information Models
    Wang, Ning
    Issa, Raja R. A.
    Anumba, Chimay J.
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2022, 36 (03)