Web Information Extraction for content augmentation

被引：0

作者：

Janevski, A ^{[1
]}

Dimitrova, N ^{[1
]}

机构：

[1] Philips Res USA, Briarcliff Manor, NY 10510 USA

来源：

IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS | 2002年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Today users have to cope with an overwhelming amount of TV channels and Web content sources. We introduce automatic content augmentation, as a novel approach to contextual information extraction on behalf of the user where the context is provided by the primary content source (i.e. TV channel) and tailored by user's preferences. A key aspect of this approach is Web Information Extraction (WebIE) which automatically derives structured information from unstructured Web documents. Our system executes WebIE tasks, each an instantiation of WebIE rules - our generic document processors. We present two WebIE approaches: Diffusion WebIE that crawls a wide set of Web pages and extracts information from a subset of the pertinent pages; and Laser WebIE that accesses a select set of Web pages and extracts narrowly defined information. We describe the architecture and the implementation details of the system and provide detailed Laser WebIE examples.

引用

页码：A389 / A392

页数：4

共 50 条

[1] Exploiting Content Redundancy for Web Information Extraction
Gulhane, Pankaj
Rastogi, Rajeev
Sengamedu, Srinivasan H.
Tengli, Ashwin
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 578 - 587
[2] Web Content Information Extraction Based on DOM Tree and Statistical Information
Yu, Xin
Jin, Zhengping
[J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1308 - 1311
[3] Content Information Extraction of Theme Web Pages based on Tag Information
Wang, Jie
Wu, Jian
Zhang, Yafeng
He, Guowan
[J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 1, 2014, : 501 - 504
[4] INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING
Fragkou, Pavlina
[J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2013, 23 (08) : 1109 - 1137
[5] Web Table Extraction, Retrieval and Augmentation
Zhang, Shuo
Balog, Krisztian
[J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1409 - 1410
[6] A web page content information extraction method based on tag window
Zhao, Xin-Xin
Suo, Hong-Guang
Liu, Yu-Shu
[J]. Proceedings of 2006 International Conference on Machine Learning and Cybernetics, Vols 1-7, 2006, : 1598 - 1601
[7] Extraction of Context Information from Web Content Using Entity Linking
Hirata, Norifumi
Shiramatsu, Shun
Ozono, Tadachika
Shintani, Toramatsu
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2013, 13 (02): : 18 - 23
[8] Web Table Extraction, Retrieval, and Augmentation: A Survey
Zhang, Shuo
Balog, Krisztian
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (02)
[9] MedIEQ - Quality Labelling of Medical Web Content Using Multilingual Information Extraction
Angel Mayer, Miquel
Karkaletsis, Vangelis
Stamatakis, Kostas
Leis, Angela
Villarroel, Dagmar
Thomeczek, Christian
Labsky, Martin
Lopez-Ostenero, Fernando
Honkela, Timo
[J]. MEDICAL AND CARE COMPUNETICS 3, 2006, 121 : 183 - +
[10] Extraction of Web Content Based on Content Type
Verma, Manish Kumar
Kumar, Sarowar
Abhishek, Kumar
Singh, M. P.
[J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT, ICT4SD 2015, VOL 1, 2016, 408 : 105 - 113

← 1 2 3 4 5 →