Mining Subjective Properties on the Web

被引:10
|
作者
Trummer, Immanuel [1 ]
Halevy, Alon [2 ]
Lee, Hongrae [2 ]
Sarawagi, Sunita [2 ,3 ]
Gupta, Rahul [2 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] Google Inc, Mountain View, CA USA
[3] Indian Inst Technol, Bombay, Maharashtra, India
关键词
Text mining; subjective properties; user behavior model;
D O I
10.1145/2723372.2750548
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Even with the recent developments in Web search of answering queries from structured data, search engines are still limited to queries with an objective answer, such as EUROPEAN CAPITALS or WOODY ALLEN MOVIES. However, many queries are subjective, such as SAFE CITIES, or CUTE ANIMALS. The underlying knowledge bases of search engines do not contain answers to these queries because they do not have a ground truth. We describe the SURVEYOR system that mines the dominant opinion held by authors of Web content about whether a subjective property applies to a given entity. The evidence on which SURVEYOR relies is statements extracted from Web text that either support the property or claim its negation. The key challenge that SURVEYOR faces is that simply counting the number of positive and negative statements does not suffice, because there are multiple hidden biases with which content tends to be authored on the Web. SURVEYOR employs a probabilistic model of how content is authored on the Web. As one example, this model accounts for correlations between the subjective property and the frequency with which it is mentioned on the Web. The parameters of the model are specialized to each property and entity type. SURVEYOR was able to process a large Web snapshot within a few hours, resulting in opinions for over 4 billion entity-property combinations. We selected a subset of 500 entity-property combinations and compared our results to the dominant opinion of a large number of Amazon Mechanical Turk (AMT) workers. The predictions of SURVEYOR match the results from AMT in 77% of all cases (and 87% for test cases where inter-worker agreement is high), significantly outperforming competing approaches.
引用
收藏
页码:1745 / 1760
页数:16
相关论文
共 50 条
  • [31] Mining the Social Web
    Robertson, Angela
    [J]. TECHNICAL COMMUNICATION, 2012, 59 (02) : 155 - 155
  • [32] WEB MINING for INNOVATION
    Engler, Joseph
    Kusiak, Andrew
    [J]. MECHANICAL ENGINEERING, 2008, 130 (11) : 38 - 40
  • [33] Web Mining Service (WMS), a public and free service for web data mining
    Miguel Gago, Jose
    Guerrero, Carlos
    Juiz, Carlos
    Puigjaner, Ramon
    [J]. 2009 FOURTH INTERNATIONAL CONFERENCE ON INTERNET AND WEB APPLICATIONS AND SERVICES, 2009, : 351 - 356
  • [34] Semantic-Synaptic Web Mining: A Novel Model for Improving the Web Mining
    Azad, Hiteshwar Kumar
    Abhishek, Kumar
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 454 - 457
  • [35] Overview: Web log Mining, Privacy Issues and Application of Web Log Mining
    Singh, Amarjeet
    Sreeram, Y. Chaitanya
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2014, : 638 - 641
  • [36] From Web Mining to Social Multimedia Mining
    Lappas, Georgios
    [J]. 2011 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2011), 2011, : 336 - 343
  • [37] Association rule mining with subjective knowledge
    Kulasekere, EC
    Premaratne, K
    Shyu, ML
    Bauer, PH
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVI, PROCEEDINGS: COMPUTER SCIENCE III, 2002, : 417 - 422
  • [38] On incorporating subjective interestingness into the mining process
    Sahar, S
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 681 - 684
  • [39] Subjective Interestingness in Exploratory Data Mining
    De Bie, Tijl
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS XII, 2013, 8207 : 19 - 31
  • [40] Web user log mining for web retrieval
    Yu, YJ
    Chen, C
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 97 - 100