General purpose computer-assisted clustering and conceptualization

被引:76
|
作者
Grimmer, Justin [2 ]
King, Gary [1 ]
机构
[1] Harvard Univ, Inst Quantitat Social Sci, Cambridge, MA 02138 USA
[2] Stanford Univ, Dept Polit Sci, Palo Alto, CA 94305 USA
关键词
D O I
10.1073/pnas.1018067108
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We develop a computer-assisted method for the discovery of insightful conceptualizations, in the form of clusterings (i.e., partitions) of input objects. Each of the numerous fully automated methods of cluster analysis proposed in statistics, computer science, and biology optimize a different objective function. Almost all are well defined, but how to determine before the fact which one, if any, will partition a given set of objects in an "insightful" or "useful" way for a given user is unknown and difficult, if not logically impossible. We develop a metric space of partitions from all existing cluster analysis methods applied to a given dataset (along with millions of other solutions we add based on combinations of existing clusterings) and enable a user to explore and interact with it and quickly reveal or prompt useful or insightful conceptualizations. In addition, although it is uncommon to do so in unsupervised learning problems, we offer and implement evaluation designs that make our computer-assisted approach vulnerable to being proven suboptimal in specific data types. We demonstrate that our approach facilitates more efficient and insightful discovery of useful information than expert human coders or many existing fully automated methods.
引用
收藏
页码:2643 / 2650
页数:8
相关论文
共 50 条