Auto-Grouping Emails For Faster E-Discovery

被引:0
|
作者
Joshi, Sachindra [1 ]
Contractor, Danish [1 ]
Ng, Kenney [2 ]
Deshpande, Prasad M. [1 ]
Hampp, Thomas [3 ]
机构
[1] IBM Res, New Delhi, India
[2] IBM Software Grp, New York, NY USA
[3] IBM Software Grp, Frankfurt, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2011年 / 4卷 / 12期
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we examine the application of various grouping techniques to help improve the efficiency and reduce the costs involved in an electronic discovery process. Specifically, we create coherent groups of email documents which characterize either a syntactic theme, a semantic theme or an email thread. All such grouped documents can be reviewed together leading to a faster and more consistent review of documents. Syntactic grouping of emails is based on near duplicate detection whereas semantic grouping is based on identifying concepts in the email content using information extraction. Email thread detection is achieved using a combination of segmentation and near duplicate detection. We present experimental results on the Enron corpus that suggest that these approaches can significantly reduce the review time and show that high precision and recall in identifying the groups can be achieved. We also describe how these techniques are integrated into the IBM eDiscovery Analyzer product offering.
引用
收藏
页码:1284 / 1294
页数:11
相关论文
共 50 条
  • [1] SECTION 1920 AND E-DISCOVERY
    Haft, Joshua A.
    UNIVERSITY OF PITTSBURGH LAW REVIEW, 2012, 74 (02) : 359 - 382
  • [2] Dispute Resolution and e-Discovery
    Luoma, Milton
    JOURNAL OF DIGITAL FORENSICS SECURITY AND LAW, 2012, 7 (03) : 111 - 113
  • [3] Information Retrieval for E-Discovery
    Lewis, David D.
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 913 - 913
  • [4] E-discovery gets real
    Krause, Jason
    ABA JOURNAL, 2007, 93 : 44 - 51
  • [5] Information Retrieval for E-Discovery
    Oard, Douglas W.
    Webber, William
    FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2013, 7 (2-3): : 99 - 237
  • [6] Semantic Middleware for E-Discovery
    Butler, Mark
    Reynolds, Dave
    Dickinson, Ian
    McBride, Brian
    Grosvenor, Dave
    Seaborne, Andy
    2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 275 - 280
  • [7] Supervised multivariate learning with simultaneous feature auto-grouping and dimension reduction
    She, Yiyuan
    Shen, Jiahui
    Zhang, Chao
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (03) : 912 - 932
  • [8] Afterword: data, knowledge, and e-discovery
    Lewis, David
    ARTIFICIAL INTELLIGENCE AND LAW, 2010, 18 (04) : 481 - 486
  • [9] SANCTIONS FOR E-DISCOVERY VIOLATIONS: BY THE NUMBERS
    Willoughby, Dan H., Jr.
    Jones, Rose Hunter
    Antine, Gregory R.
    DUKE LAW JOURNAL, 2010, 60 (03) : 789 - 864
  • [10] Automation of legal sensemaking in e-discovery
    Hogan, Christopher
    Bauer, Robert
    Brassil, Dan
    ARTIFICIAL INTELLIGENCE AND LAW, 2010, 18 (04) : 431 - 457