The "BigSE" Project: Lessons Learned from Validating Industrial Text Mining

被引:0
|
作者
Krishna, Rahul
Yu, Zhe
Agrawal, Amritanshu
Dominguez, Manuel [1 ]
Wolf, David [1 ]
机构
[1] LexisNexis, Raleigh, NC 27606 USA
关键词
E-Discovery; Software Engineering; Testing;
D O I
10.1145/2896825.2896836
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As businesses become increasingly reliant on big data analytics, it becomes increasingly important to test the choices made within the data miners. This paper reports lessons learned from the BigSE Lab, an industrial/university collaboration that augments industrial activity with low-cost testing of data miners (by graduate students). BigSE is an experiment in academic/industrial collaboration. Funded by a gift from LexisNexis, BigSE has no specific deliverables. Rather, it is fueled by a research question "what can industry and academia learn from each other?". Based on open source data and tools, the output of this work is (a) more exposure by commercial engineers to state-of-the-art methods and (b) more exposure by students to industrial text mining methods (plus research papers that comment on methods on how to improve those methods). The results so far are encouraging. Students at BigSE Lab have found numerous "standard" choices for text mining that could be replaced by simpler and less resource intensive methods. Further, that work also found additional text mining choices that could significantly improve the performance of industrial data miners.
引用
收藏
页码:65 / 71
页数:7
相关论文
共 50 条