Web Information Extraction for content augmentation

被引:0
|
作者
Janevski, A [1 ]
Dimitrova, N [1 ]
机构
[1] Philips Res USA, Briarcliff Manor, NY 10510 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Today users have to cope with an overwhelming amount of TV channels and Web content sources. We introduce automatic content augmentation, as a novel approach to contextual information extraction on behalf of the user where the context is provided by the primary content source (i.e. TV channel) and tailored by user's preferences. A key aspect of this approach is Web Information Extraction (WebIE) which automatically derives structured information from unstructured Web documents. Our system executes WebIE tasks, each an instantiation of WebIE rules - our generic document processors. We present two WebIE approaches: Diffusion WebIE that crawls a wide set of Web pages and extracts information from a subset of the pertinent pages; and Laser WebIE that accesses a select set of Web pages and extracts narrowly defined information. We describe the architecture and the implementation details of the system and provide detailed Laser WebIE examples.
引用
收藏
页码:A389 / A392
页数:4
相关论文
共 50 条
  • [21] A novel text mining approach for scholar information extraction from web content in Chinese
    Xie, Xia
    Fu, Yu
    Jin, Hai
    Zhao, Yaliang
    Cao, Wenzhi
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 859 - 872
  • [22] A Versatile Model for Web Page Representation, Information Extraction and Content Re-Packaging
    Kruepl-Sypien, Bernhard
    Baumgartner, Robert
    Fayzrakhmanov, Ruslan R.
    Holzinger, Wolfgang
    Panzenboeck, Mathias
    [J]. DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 129 - 138
  • [23] Information extraction from massive Web pages based on node property and text content
    Wang H.-Y.
    Cao P.
    [J]. 1600, Editorial Board of Journal on Communications (37): : 9 - 17
  • [24] Augmentation of Printed Content with Web-based Technologies
    Schauer, Sophie
    Letellier, Julien
    Sieck, Juergen
    [J]. PROCEEDINGS OF THE THE 11TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS (IDAACS'2021), VOL 1, 2021, : 252 - 256
  • [25] Intelligent Web Robot for Content Extraction
    Wenxing HONG
    Jie LI
    Weiwei WANG
    Yang WENG
    [J]. Instrumentation, 2019, 6 (03) : 52 - 58
  • [26] The Web-OEM approach to Web information extraction
    Iocchi, L
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 1999, 22 (04) : 259 - 269
  • [27] Extraction Rule Language for Web Information Extraction and Integration
    Wei, Wu
    Shi, Shengsheng
    Liu, Yulong
    Wang, Haitao
    Yuan, Chunfeng
    Huang, Yihua
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 65 - +
  • [28] Information Extraction in Illicit Web Domains
    Kejriwal, Mayank
    Szekely, Pedro
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 997 - 1006
  • [29] Web information extraction by competing classification
    Li, Xiang-Yang
    Lu, Jian-Jiang
    Zhang, Ya-Fei
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2004, 32 (11): : 1915 - 1917
  • [30] A hybrid approach for web information extraction
    Xiao, Ji-Yi
    Zhu, Dao-Hui
    Zou, La-Mei
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563