Web Information Extraction for content augmentation

被引：0

作者：

Janevski, A ^{[1
]}

Dimitrova, N ^{[1
]}

机构：

[1] Philips Res USA, Briarcliff Manor, NY 10510 USA

来源：

IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today users have to cope with an overwhelming amount of TV channels and Web content sources. We introduce automatic content augmentation, as a novel approach to contextual information extraction on behalf of the user where the context is provided by the primary content source (i.e. TV channel) and tailored by user's preferences. A key aspect of this approach is Web Information Extraction (WebIE) which automatically derives structured information from unstructured Web documents. Our system executes WebIE tasks, each an instantiation of WebIE rules - our generic document processors. We present two WebIE approaches: Diffusion WebIE that crawls a wide set of Web pages and extracts information from a subset of the pertinent pages; and Laser WebIE that accesses a select set of Web pages and extracts narrowly defined information. We describe the architecture and the implementation details of the system and provide detailed Laser WebIE examples.

引用

页码：A389 / A392

页数：4

共 50 条

[21] A novel text mining approach for scholar information extraction from web content in Chinese
Xie, Xia
Fu, Yu
Jin, Hai
Zhao, Yaliang
Cao, Wenzhi
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 111 : 859 - 872
[22] A Versatile Model for Web Page Representation, Information Extraction and Content Re-Packaging
Kruepl-Sypien, Bernhard
Baumgartner, Robert
Fayzrakhmanov, Ruslan R.
Holzinger, Wolfgang
Panzenboeck, Mathias
[J]. DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 129 - 138
[23] Information extraction from massive Web pages based on node property and text content
Wang H.-Y.
Cao P.
[J]. 1600, Editorial Board of Journal on Communications (37): : 9 - 17
[24] Augmentation of Printed Content with Web-based Technologies
Schauer, Sophie
Letellier, Julien
Sieck, Juergen
[J]. PROCEEDINGS OF THE THE 11TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS (IDAACS'2021), VOL 1, 2021, : 252 - 256
[25] Intelligent Web Robot for Content Extraction
Wenxing HONG
Jie LI
Weiwei WANG
Yang WENG
[J]. Instrumentation, 2019, 6 (03) : 52 - 58
[26] The Web-OEM approach to Web information extraction
Iocchi, L
[J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 1999, 22 (04) : 259 - 269
[27] Extraction Rule Language for Web Information Extraction and Integration
Wei, Wu
Shi, Shengsheng
Liu, Yulong
Wang, Haitao
Yuan, Chunfeng
Huang, Yihua
[J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 65 - +
[28] Information Extraction in Illicit Web Domains
Kejriwal, Mayank
Szekely, Pedro
[J]. PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 997 - 1006
[29] Web information extraction by competing classification
Li, Xiang-Yang
Lu, Jian-Jiang
Zhang, Ya-Fei
[J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2004, 32 (11): : 1915 - 1917
[30] A hybrid approach for web information extraction
Xiao, Ji-Yi
Zhu, Dao-Hui
Zou, La-Mei
[J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 1560 - 1563

← 1 2 3 4 5 →