News item extraction for text mining in web newspapers

被引:4
|
作者
Norvåg, K [1 ]
Oyri, R [1 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, N-7491 Trondheim, Norway
关键词
D O I
10.1109/WIRI.2005.27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web newspapers provide a valuable resource for information. In order to benefit more from the available information, text mining techniques can be applied. However because each newspaper page often covers a lot of unrelated topics, page-based data mining will not always give useful results. In order to improve on complete-page mining, we present an approach based on extracting the individual news items from the web pages and mining these separately. Automatic news item extraction is a difficult problem, and in this paper we also provide strategies solving that task. We study the quality of the news item extraction, and also provide results from clustering the extracted news items.
引用
收藏
页码:195 / 204
页数:10
相关论文
共 50 条
  • [1] Mining on Terms Extraction from Web News
    Hsu, Li-Fu
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2010, 6421 : 188 - 194
  • [2] The feature extraction of text mining based on Web
    Liu, LZ
    Chen, JJ
    Song, HT
    [J]. ICEMI'2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOLS 1-3, 2003, : 547 - 550
  • [3] Research and realization of extraction algorithm on web text mining
    Yin, Shiqun
    Qu, Yuhui
    Ge, Jike
    Lan, Xiaohong
    [J]. IITA 2007: WORKSHOP ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, PROCEEDINGS, 2007, : 278 - +
  • [4] Title-Based Extraction of News Contents for Text Mining
    Tan, Zhen
    He, Chunhui
    Fang, Yang
    Ge, Bin
    Xiao, Weidong
    [J]. IEEE ACCESS, 2018, 6 : 64085 - 64095
  • [5] Web News Data Extraction Technology Based on Text Keywords
    Zhang, Kun
    [J]. COMPLEXITY, 2021, 2021
  • [6] Unsupervised learning of mDTD extraction patterns for Web text mining
    Kim, D
    Jung, HM
    Lee, GG
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (04) : 623 - 637
  • [7] INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING
    Fragkou, Pavlina
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (08) : 1109 - 1137
  • [8] Extremely local news: Community newspapers on the Web
    Marcus, J
    [J]. DATABASE, 1998, 21 (02): : 73 - 75
  • [9] Language independent web news extraction system based on text detection framework
    Wu, Yu-Chieh
    [J]. INFORMATION SCIENCES, 2016, 342 : 132 - 149
  • [10] Efficient Web Page Main Text Extraction towards Online News Analysis
    Zhou, Baoyao
    Xiong, Yuhong
    Liu, Wei
    [J]. ICEBE 2009: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2009, : 37 - 41