Improving topic modeling through homophily for legal documents

被引:0
|
作者
Kazuki Ashihara
Cheikh Brahim El Vaigh
Chenhui Chu
Benjamin Renoust
Noriko Okubo
Noriko Takemura
Yuta Nakashima
Hajime Nagahara
机构
[1] Osaka University,Graduate of Information Science and Technology
[2] Inria,Institute for Datability Science
[3] IRISA,Graduate School of Law and Politics
[4] Osaka University,undefined
[5] Osaka University,undefined
来源
关键词
Homophily network; Topic modeling; Legal documents;
D O I
暂无
中图分类号
学科分类号
摘要
Topic modeling that can automatically assign topics to legal documents is very important in the domain of computational law. The relevance of the modeled topics strongly depends on the legal context they are used in. On the other hand, references to laws and prior cases are key elements for judges to rule on a case. Taken together, these references form a network, whose structure can be analysed with network analysis. However, the content of the referenced documents may not be always accessed. Even in that case, the reference structure itself shows that documents share latent similar characteristics. We propose to use this latent structure to improve topic modeling of law cases using document homophily. In this paper, we explore the use of homophily networks extracted from two types of references: prior cases and statute laws, to enhance topic modeling on legal case documents. We conduct in detail, an analysis on a dataset consisting of rich legal cases, i.e., the COLIEE dataset, to create these networks. The homophily networks consist of nodes for legal cases, and edges with weights for the two families of references between the case nodes. We further propose models to use the edge weights for topic modeling. In particular, we propose a cutting model and a weighting model to improve the relational topic model (RTM). The cutting model uses edges with weights higher than a threshold as document links in RTM; the weighting model uses the edge weights to weight the link probability function in RTM. The weights can be obtained either from the co-citations or from the cosine similarity based on an embedding of the homophily networks. Experiments show that the use of the homophily networks for topic modeling significantly outperforms previous studies, and the weighting model is more effective than the cutting model.
引用
收藏
相关论文
共 50 条
  • [31] Topic Modeling of Small Sequential Documents: Proposed Experiments for Detecting Terror Attacks
    Jones, Brandon W.
    Chung, Wingyan
    IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: CYBERSECURITY AND BIG DATA, 2016, : 310 - 312
  • [32] Topic Classification Through Topic Modeling with Additive Regularization for Collection of Scientific Papers
    Krasnov, Fedor
    CEE-SECR'18: PROCEEDINGS OF THE 14TH CENTRAL AND EASTERN EUROPEAN SOFTWARE ENGINEERING CONFERENCE RUSSIA, 2018,
  • [33] Urdu Documents Clustering with Unsupervised and Semi-Supervised Probabilistic Topic Modeling
    Mustafa, Mubashar
    Zeng, Feng
    Ghulam, Hussain
    Muhammad Arslan, Hafiz
    INFORMATION, 2020, 11 (11) : 1 - 16
  • [34] Topic detection in read documents
    Amaral, R
    Trancoso, I
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 2000, 1923 : 315 - 318
  • [35] A Topic Model for Hierarchical Documents
    Yang, Yang
    Wang, Feifei
    Jiang, Fei
    Jin, Shuyuan
    Xu, Jin
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 118 - 126
  • [36] UNDERSTANDING MIMETIC DOCUMENTS THROUGH KNOWLEDGE MODELING
    KIRSCH, IS
    MOSENTHAL, PB
    JOURNAL OF READING, 1991, 34 (07): : 552 - 558
  • [37] Editorial: Improving cancer chemotherapy through pharmacogenomics: a research topic
    Quinones, Luis A.
    Lee, Kuen S.
    FRONTIERS IN GENETICS, 2015, 6
  • [38] Community Detection Through Topic Modeling in Social Networks
    Tamimi, Imane
    Lamrani, El Khadir
    El Kamili, Mohamed
    UBIQUITOUS NETWORKING, UNET 2017, 2017, 10542 : 70 - 80
  • [39] Sentiment Analysis in Social Networks through Topic Modeling
    Naskar, Debashis
    Mokaddem, Sidahmed
    Rebollo, Miguel
    Onaindia, Eva
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 46 - 53
  • [40] Empirical study of constructing a knowledge organization system of patent documents using topic modeling
    Zhengyin Hu
    Shu Fang
    Tian Liang
    Scientometrics, 2014, 100 : 787 - 799