Extracting classification knowledge of Internet documents with mining term associations: A semantic approach

被引:0
|
作者
Natl Cheng Kung Univ, Tainan, Taiwan [1 ]
机构
来源
SIGIR Forum | / 241-249期
关键词
Algorithms - Computational linguistics - Data mining - Feature extraction - Hierarchical systems - Inference engines - Internet - Polynomials - Search engines;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we present a system that extracts and generalizes terms from Internet documents to represent classification knowledge of a given class hierarchy. We propose a measurement to evaluate the importance of a term with respect to a class in the class hierarchy, and denote it as support. With a given threshold, terms with high supports are sifted as keywords of a class, and terms with low supports are filtered out. To further enhance the recall of this approach, Mining Association Rules technique is applied to mine the association between terms. An inference model is composed of these association relations and the previously computed supports of the terms in the class. To increase the recall rate of the keyword selection process, we then present a polynomial-time inference algorithm to promote a term, strongly associated to a known keyword, to a keyword. According to our experiment results on the collected Internet documents from Yam search engine, we show that the proposed methods in the paper contribute to refine the classification knowledge and increase the recall of keyword selection.
引用
收藏
相关论文
共 48 条
  • [31] Spam e-mail classification for the Internet of Things environment using semantic similarity approach
    S. Venkatraman
    B. Surendiran
    P. Arun Raj Kumar
    The Journal of Supercomputing, 2020, 76 : 756 - 776
  • [32] Spam e-mail classification for the Internet of Things environment using semantic similarity approach
    Venkatraman, S.
    Surendiran, B.
    Kumar, P. Arun Raj
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (02): : 756 - 776
  • [33] Extracting and reusing blocks of knowledge in learning classifier systems for text classification: a lifelong machine learning approach
    Arif, Muhammad Hassan
    Iqbal, Muhammad
    Li, Jianxin
    SOFT COMPUTING, 2019, 23 (23) : 12673 - 12682
  • [34] Extracting and reusing blocks of knowledge in learning classifier systems for text classification: a lifelong machine learning approach
    Muhammad Hassan Arif
    Muhammad Iqbal
    Jianxin Li
    Soft Computing, 2019, 23 : 12673 - 12682
  • [35] Profiling internet banking users: A knowledge discovery in data mining process model based approach
    Gunjan Mansingh
    Lila Rao
    Kweku-Muata Osei-Bryson
    Annette Mills
    Information Systems Frontiers, 2015, 17 : 193 - 215
  • [36] Textual data mining for industrial knowledge management and text classification: A business oriented approach
    Ur-Rahman, N.
    Harding, J. A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4729 - 4739
  • [37] Profiling internet banking users: A knowledge discovery in data mining process model based approach
    Mansingh, Gunjan
    Rao, Lila
    Osei-Bryson, Kweku-Muata
    Mills, Annette
    INFORMATION SYSTEMS FRONTIERS, 2015, 17 (01) : 193 - 215
  • [38] NEAT-Named Entities in Archaeological Texts: A semantic approach to term extraction and classification
    di Buono, Maria Pia
    Nolano, Gennaro
    Monti, Johanna
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2023, 38 (03) : 997 - 1013
  • [39] Turning Online Product Reviews to Customer Knowledge: A Semantic-based Sentiment Classification Approach
    Wei, Chih-Ping
    Yang, Chin-Sheng
    Huang, Chun-Neng
    PACIFIC ASIA CONFERENCE ON INFORMATION SYSTEMS 2006, SECTIONS 1-8, 2006, : 600 - +
  • [40] A Knowledge-Driven Approach for Automatic Semantic Aspect Term Extraction Using the Semantic Power of Linked Open Data
    Suwanpipob, Worapoj
    Arch-Int, Ngamnij
    Wunnasri, Warunya
    APPLIED SCIENCES-BASEL, 2024, 14 (13):