Low-Cost Supervision for Multiple-Source Attribute Extraction

被引:0
|
作者
Reisinger, Joseph [1 ]
Pasca, Marius [2 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Google Inc, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents Occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we introduce Bootstrapped Web Search (BWS) extraction, the first approach to extracting class attributes simultaneously from both sources. Extraction is guided by a small set of seed attributes and does not rely on further domain-specific knowledge. BWS is shown to improve extraction precision and also to improve attribute relevance across 40 test classes.
引用
收藏
页码:382 / +
页数:3
相关论文
共 50 条