Domain Specific Features Driven Information Extraction from Web Pages of Scientific Conferences

被引:0
|
作者
Andruszkiewicz, Piotr [1 ]
Hazan, Rafal [1 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Warsaw, Poland
关键词
D O I
10.1007/978-3-319-77113-7_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe information extraction from web pages of scientific conferences. We enrich already known features with our new features specific for this domain and show their importance in the process of extracting information. Moreover, we investigate various data representation models, e.g., based on single tokens or sequences, in order to find the best configuration for the task in question and set up a new baseline over publicly available corpus.
引用
收藏
页码:405 / 417
页数:13
相关论文
共 50 条
  • [1] Information Extraction from Web pages
    Novotny, Robert
    Vojtas, Peter
    Maruscak, Dusan
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +
  • [2] Information extraction from Web pages using presentation regularities and domain knowledge
    Vadrevu, Srinivas
    Gelgi, Fatih
    Davulcu, Hasan
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2007, 10 (02): : 157 - 179
  • [3] Information Extraction from Web Pages Using Presentation Regularities and Domain Knowledge
    Srinivas Vadrevu
    Fatih Gelgi
    Hasan Davulcu
    [J]. World Wide Web, 2007, 10 : 157 - 179
  • [4] Visual extraction of information from web pages
    Della Penna, Giuseppe
    Magazzeni, Daniele
    Orefice, Sergio
    [J]. JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2010, 21 (01): : 23 - 32
  • [5] An open platform for collecting domain specific web pages and extracting information from them
    Karkaletsis, V
    Spyropoulos, CD
    [J]. Knowledge Mining, 2005, 185 : 147 - 157
  • [6] Extract Knowledge from Web Pages in a Specific Domain
    Lu, Yihong
    Yu, Shuiyuan
    Shi, Minyong
    Li, Chunfang
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT (KSEM 2018), PT I, 2018, 11061 : 117 - 124
  • [7] Prioritization of Domain-Specific Web Information Extraction
    Huang, Jian
    Yu, Cong
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1327 - 1333
  • [8] Building intelligent systems for mining information extraction rules from Web pages by using domain knowledge
    Seo, H
    Yang, J
    Choi, J
    [J]. ISIE 2001: IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS PROCEEDINGS, VOLS I-III, 2001, : 322 - 327
  • [9] Bootstrapping Information Extraction from Semi-structured Web Pages
    Carlson, Andrew
    Schafer, Charles
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART I, PROCEEDINGS, 2008, 5211 : 195 - +
  • [10] KEYWORD EXTRACTION OF WEB PAGES BASED ON DOMAIN THESAURUS
    He, Guowan
    Wang, Jie
    Zhang, Yafeng
    Peng, Yan
    [J]. 2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 310 - 314