Improving topic modeling through homophily for legal documents

被引:0
|
作者
Kazuki Ashihara
Cheikh Brahim El Vaigh
Chenhui Chu
Benjamin Renoust
Noriko Okubo
Noriko Takemura
Yuta Nakashima
Hajime Nagahara
机构
[1] Osaka University,Graduate of Information Science and Technology
[2] Inria,Institute for Datability Science
[3] IRISA,Graduate School of Law and Politics
[4] Osaka University,undefined
[5] Osaka University,undefined
来源
关键词
Homophily network; Topic modeling; Legal documents;
D O I
暂无
中图分类号
学科分类号
摘要
Topic modeling that can automatically assign topics to legal documents is very important in the domain of computational law. The relevance of the modeled topics strongly depends on the legal context they are used in. On the other hand, references to laws and prior cases are key elements for judges to rule on a case. Taken together, these references form a network, whose structure can be analysed with network analysis. However, the content of the referenced documents may not be always accessed. Even in that case, the reference structure itself shows that documents share latent similar characteristics. We propose to use this latent structure to improve topic modeling of law cases using document homophily. In this paper, we explore the use of homophily networks extracted from two types of references: prior cases and statute laws, to enhance topic modeling on legal case documents. We conduct in detail, an analysis on a dataset consisting of rich legal cases, i.e., the COLIEE dataset, to create these networks. The homophily networks consist of nodes for legal cases, and edges with weights for the two families of references between the case nodes. We further propose models to use the edge weights for topic modeling. In particular, we propose a cutting model and a weighting model to improve the relational topic model (RTM). The cutting model uses edges with weights higher than a threshold as document links in RTM; the weighting model uses the edge weights to weight the link probability function in RTM. The weights can be obtained either from the co-citations or from the cosine similarity based on an embedding of the homophily networks. Experiments show that the use of the homophily networks for topic modeling significantly outperforms previous studies, and the weighting model is more effective than the cutting model.
引用
收藏
相关论文
共 50 条
  • [21] New metrics and tests for subject prevalence in documents based on topic modeling
    Kontoghiorghes, Louisa
    Colubi, Ana
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2023, 157 : 49 - 69
  • [22] Improving neural topic modeling via Sinkhorn divergence
    Liu, Luyang
    Huang, Heyan
    Gao, Yang
    Zhang, Yongfeng
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (03)
  • [23] Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization
    Tran, Hung Nghiep
    Takasu, Atsuhiro
    2015 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2015, : 315 - 320
  • [24] Improving LDA Topic Modeling with Gamma and Simmelian Filtration
    Williams, Evan M.
    Levin, David
    McCulloh, Ian
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2020, : 692 - 696
  • [25] Semi-automatic ontology construction for improving comprehension of legal documents
    Cestnik, Bojan
    Kern, Alenka
    Modrijan, Helena
    ELECTRONIC GOVERNMENT, PROCEEDINGS, 2008, 5184 : 328 - +
  • [26] Improving OCR for Historical Documents by Modeling Image Distortion
    Maekawa, Keiya
    Tomiura, Yoichi
    Fukuda, Satoshi
    Ishita, Emi
    Uchiyama, Hideaki
    DIGITAL LIBRARIES AT THE CROSSROADS OF DIGITAL INFORMATION FOR THE FUTURE, ICADL 2019, 2019, 11853 : 312 - 316
  • [27] Making Recommendations on Microblogs through Topic Modeling
    Chen, Chaochao
    Zheng, Xiaolin
    Zhou, Chaofei
    Chen, Deren
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013 WORKSHOPS, 2014, 8182 : 252 - 265
  • [28] Time Series Impact Through Topic Modeling
    Cendrero, Julian
    Gonzalo, Julio
    Galletero, Marcos
    Zapata, Ivar
    IEEE ACCESS, 2022, 10 : 97327 - 97347
  • [29] Hierarchical Summarization of Text Documents Using Topic Modeling and Formal Concept Analysis
    Akhtar, Nadeem
    Javed, Hira
    Ahmad, Tameem
    DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2018, VOL 2, 2019, 839 : 21 - 33
  • [30] Clustering-based topic modeling for biomedical documents extractive text summarization
    Nabil M. AbdelAziz
    Aliaa A. Ali
    Soaad M. Naguib
    Lamiaa S. Fayed
    The Journal of Supercomputing, 2025, 81 (1)