Data Quality for Deep Learning of Judgment Documents: An Empirical Study

被引:0
|
作者
Liu, Jiawei [1 ,2 ]
Wang, Dong [2 ]
Wang, Zhenzhen [2 ,3 ]
Chen, Zhenyu [1 ,2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Software Testing Engn Lab Jiangsu Prov, Nanjing, Peoples R China
[3] Jinling Inst Technol, Sch Software, Nanjing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Judgment document; Deep learning; Quality measurement; Natural language processing;
D O I
10.1007/978-981-15-3412-6_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The revolution in hardware technology has made it possible to obtain high-definition data through highly sophisticated algorithms. Deep learning has emerged and is widely used in various fields, and the judicial area is no exception. As the carrier of the litigation activities, the judgment documents record the process and results of the people's courts, and their quality directly affects the fairness and credibility of the law. To be able to measure the quality of judgment documents, the interpretability of judgment documents has been an indispensable dimension. Unfortunately, due to the various uncontrollable factors during the process, such as data transmission and storage, The data set for training usually has a poor quality. Besides, due to the severe imbalance of the distribution of case data, data augmentation is essential to generate data for low-frequency cases. Based on the existing data set and the application scenarios, we explore data quality issues in four areas. Then we systematically investigate them to figure out their impact on the data set. After that, we compare the four dimensions to find out which one has the most considerable damage to the data set.
引用
收藏
页码:43 / 50
页数:8
相关论文
共 50 条
  • [1] Data Augmentation for Deep Learning of Judgment Documents
    Yan, Ge
    Li, Yu
    Zhang, Shu
    Chen, Zhenyu
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 232 - 242
  • [2] Analysis of Criminal Case Judgment Documents Based on Deep Learning
    Han, Jinbo
    Li, Dakui
    Yang, Nanhai
    Liu, Zhu
    Nan, Qiong
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON ADVANCED CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (ACAAI 2018), 2018, 155 : 261 - 264
  • [3] An Empirical Study on Quality Issues of Deep Learning Platform
    Gao, Yanjie
    Shi, Xiaoxiang
    Lin, Haoxiang
    Zhang, Hongyu
    Wu, Hao
    Li, Rui
    Yang, Mao
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP, 2023, : 455 - 466
  • [4] Quality Measurement of Judgment Documents
    Liu, Jiawei
    Wang, Zhenzhen
    Yan, Ge
    Lian, Hao
    2019 COMPANION OF THE 19TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS-C 2019), 2019, : 296 - 299
  • [5] From Data Quality to Model Quality: An Exploratory Study on Deep Learning
    He, Tianxing
    Yu, Shengcheng
    Wang, Ziyuan
    Li, Jieqiong
    Chen, Zhenyu
    11TH ASIA-PACIFIC SYMPOSIUM ON INTERNETWARE (INTERNETWARE 2019), 2019,
  • [6] Deep learning for encrypted traffic classification in the face of data drift: An empirical study
    Malekghaini, Navid
    Akbari, Elham
    Salahuddin, Mohammad A.
    Limam, Noura
    Boutaba, Raouf
    Mathieu, Bertrand
    Moteau, Stephanie
    Tuffin, Stephane
    COMPUTER NETWORKS, 2023, 225
  • [7] The Scent of Deep Learning Code: An Empirical Study
    Jebnoun, Hadhemi
    Ben Braiek, Houssem
    Rahman, Mohammad Masudur
    Khomh, Foutse
    2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 420 - 430
  • [8] An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Xie, Xiaofei
    Ma, Lei
    Papadakis, Mike
    Le Traon, Yves
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2022, 31 (04)
  • [9] Applicability of Deep Learning Models for Stock Price Forecasting An Empirical Study on BANKEX Data
    Balaji, A. Jayanth
    Ram, D. S. Harish
    Nair, Binoy B.
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 947 - 953
  • [10] A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia
    Wang, Ping
    Li, Xiaodan
    Wu, Renli
    JOURNAL OF INFORMATION SCIENCE, 2021, 47 (02) : 176 - 191