A semantics-based method for clustering of Chinese web search results

被引:10
|
作者
Zhang, Hui [1 ]
Wang, Deqing [1 ]
Wang, Li [2 ]
Bi, Zhuming [3 ]
Chen, Yong [1 ]
机构
[1] Beihang Univ, Sch Comp Sci, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Econ & Management, Beijing 100191, Peoples R China
[3] Indiana Univ Purdue Univ, Dept Engn, Ft Wayne, IN 46805 USA
基金
美国国家科学基金会;
关键词
search engine; Chinese online semantic clustering; vocabulary chain; semantic similarity; Chameleon algorithm; ENTERPRISE SYSTEMS; ALGORITHM; CONSTRUCTION; CARROT(2); MODELS; TREE;
D O I
10.1080/17517575.2013.857793
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.
引用
收藏
页码:147 / 165
页数:19
相关论文
共 50 条
  • [1] Semantics-Based Code Search
    Reiss, Steven P.
    [J]. 2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2009, : 243 - 253
  • [2] Clustering Chinese Web Search Results based on Association Calculation
    Zhao, Ying
    Du, Yajun
    Peng, Qiangqiang
    [J]. RECENT TRENDS IN MATERIALS AND MECHANICAL ENGINEERING MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 55-57 : 1418 - 1423
  • [3] Searching the web: A semantics-based approach
    Cao, TH
    Nguyen, THD
    Qui, TCT
    [J]. MODELLING, SIMULATION AND OPTIMIZATION OF COMPLEX PROCESSES, 2005, : 57 - 68
  • [4] Semantics-based Automated Web Testing
    Guo, Hai-Feng
    Ouyang, Qing
    Siy, Harvey
    [J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2015, (188): : 59 - 74
  • [5] A bio-inspired, incremental clustering algorithm for semantics-based web service discovery
    Kamath, S. Sowmya
    Ananthanarayana, V.S.
    [J]. International Journal of Reasoning-based Intelligent Systems, 2015, 7 (3-4) : 261 - 275
  • [6] Semantics-Based Code Search Demonstration Proposal
    Reiss, Steven P.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, CONFERENCE PROCEEDINGS, 2009, : 385 - 386
  • [7] Semantics-based web service composition engine
    Kona, Srividya
    Bansal, Ajay
    Gupta, Gopal
    Hite, Thomas D.
    [J]. 9TH IEEE INTERNATIONAL CONFERENCE ON E-COMMERCE TECHNOLOGY/4TH IEEE INTERNATIONAL CONFERENCE ON ENTERPRISE COMPUTING, E-COMMERCE AND E-SERVICES, 2007, : 521 - +
  • [8] Semantics-based design for secure web services
    Bartoletti, Massimo
    Degano, Pierpaolo
    Ferrari, Gian Luigi
    Zunino, Roberto
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2008, 34 (01) : 33 - 49
  • [9] Semantics-based dynamic Web Service composition
    Fujii, Keita
    Suda, Tatsuya
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2006, 15 (03) : 293 - 324
  • [10] Proposal of semantics-based Web service matchmaking
    Kawamura, T
    Hasegawa, T
    Ohsuga, A
    Yamamoto, J
    [J]. ICCIMA 2001: FOURTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2001, : 87 - 92