Improving topic modeling through homophily for legal documents

被引:0
|
作者
Kazuki Ashihara
Cheikh Brahim El Vaigh
Chenhui Chu
Benjamin Renoust
Noriko Okubo
Noriko Takemura
Yuta Nakashima
Hajime Nagahara
机构
[1] Osaka University,Graduate of Information Science and Technology
[2] Inria,Institute for Datability Science
[3] IRISA,Graduate School of Law and Politics
[4] Osaka University,undefined
[5] Osaka University,undefined
来源
关键词
Homophily network; Topic modeling; Legal documents;
D O I
暂无
中图分类号
学科分类号
摘要
Topic modeling that can automatically assign topics to legal documents is very important in the domain of computational law. The relevance of the modeled topics strongly depends on the legal context they are used in. On the other hand, references to laws and prior cases are key elements for judges to rule on a case. Taken together, these references form a network, whose structure can be analysed with network analysis. However, the content of the referenced documents may not be always accessed. Even in that case, the reference structure itself shows that documents share latent similar characteristics. We propose to use this latent structure to improve topic modeling of law cases using document homophily. In this paper, we explore the use of homophily networks extracted from two types of references: prior cases and statute laws, to enhance topic modeling on legal case documents. We conduct in detail, an analysis on a dataset consisting of rich legal cases, i.e., the COLIEE dataset, to create these networks. The homophily networks consist of nodes for legal cases, and edges with weights for the two families of references between the case nodes. We further propose models to use the edge weights for topic modeling. In particular, we propose a cutting model and a weighting model to improve the relational topic model (RTM). The cutting model uses edges with weights higher than a threshold as document links in RTM; the weighting model uses the edge weights to weight the link probability function in RTM. The weights can be obtained either from the co-citations or from the cosine similarity based on an embedding of the homophily networks. Experiments show that the use of the homophily networks for topic modeling significantly outperforms previous studies, and the weighting model is more effective than the cutting model.
引用
收藏
相关论文
共 50 条
  • [1] Improving topic modeling through homophily for legal documents
    Ashihara, Kazuki
    El Vaigh, Cheikh Brahim
    Chu, Chenhui
    Renoust, Benjamin
    Okubo, Noriko
    Takemura, Noriko
    Nakashima, Yuta
    Nagahara, Hajime
    APPLIED NETWORK SCIENCE, 2020, 5 (01)
  • [2] Legal Information as a Complex Network: Improving Topic Modeling Through Homophily
    Ashihara, Kazuki
    Chu, Chenhui
    Renoust, Benjamin
    Okubo, Noriko
    Takemura, Noriko
    Nakashima, Yuta
    Nagahara, Hajime
    COMPLEX NETWORKS AND THEIR APPLICATIONS VIII, VOL 2, 2020, 882 : 28 - 39
  • [3] Organizing Portuguese Legal Documents through Topic Discovery
    Vianna, Daniela
    de Moura, Edleno Silva
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3388 - 3392
  • [4] Clustering scientific documents with topic modeling
    Yau, Chyi-Kwei
    Porter, Alan
    Newman, Nils
    Suominen, Arho
    SCIENTOMETRICS, 2014, 100 (03) : 767 - 786
  • [5] A Topic Modeling for Clustering Arabic Documents
    Alkhafaji, Doaa Wahhab
    Al-Rashid, Sura
    PROCEEDING OF 2021 2ND INFORMATION TECHNOLOGY TO ENHANCE E-LEARNING AND OTHER APPLICATION (IT-ELA 2021), 2021, : 76 - 81
  • [6] Clustering scientific documents with topic modeling
    Chyi-Kwei Yau
    Alan Porter
    Nils Newman
    Arho Suominen
    Scientometrics, 2014, 100 : 767 - 786
  • [7] Improving Topic Modeling Performance through N-gram Removal
    Almgerbi, Mohamad
    De Mauro, Andrea
    Kahlawi, Adham
    Poggioni, Valentina
    2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2021), 2021, : 162 - 169
  • [8] Improving Interpretations of Topic Modeling in Microblogs
    Alkhodair, Sarah A.
    Fung, Benjamin C. M.
    Rahman, Osmud
    Hung, Patrick C. K.
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2018, 69 (04) : 528 - 540
  • [9] Online Subset Topic Modeling for Interactive Documents Exploration
    Li, Linwei
    Wu, Yaobo
    Ke, Yixiong
    Liu, Chaoying
    Jing, Yinan
    He, Zhenying
    Wang, Xiaoyang Sean
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2018, PT I, 2018, 10827 : 916 - 923
  • [10] Recurrent Coupled Topic Modeling over Sequential Documents
    Guo, Jinjin
    Cao, Longbing
    Gong, Zhiguo
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (01)