The Research of Web Parallel Information Extraction Based on Hadoop

被引：0

作者：

Ma, Songyu ^{[1
]}

Shi, Quan ^{[1
]}

Xu, Lu ^{[1
]}

机构：

[1] Nantong Univ, Sch Comp Sci & Technol, Nantong 226019, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSAIT 2013) | 2014年 / 255卷

基金：

中国国家自然科学基金;

关键词：

Hadoop; Web information extraction; Crawler; Parallel indexing;

D O I：

10.1007/978-81-322-1759-6_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Big data that are driven by three major trends such as cloud computing, social computing, and mobile computing are reshaping the business process, IT infrastructure and our capture of the enterprise, customer and Internet information and use. To extract the big data in the Internet, the enterprise needs a scalable, flexible, and manageable data infrastructure. Therefore, this paper is based on the Hadoop framework, to analyze and design the large data information extraction system. Measurement shows that the huge amounts of data extraction on the basis of cluster have great improvement in performance compared with single extraction, with high reliability and scalability. What is more? The research of this paper will provide better technical solutions to Web information extraction and sensitive information.

引用

页码：341 / 348

页数：8

共 50 条

[1] Research on Web Information Extraction Based on XML
Hu, Yan
Xuan, Yanyan
[J]. SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 201 - 204
[2] Hadoop Based Parallel Deduplication Method for Web Documents
Song, Junjie
Liu, Jin
Zheng, Yuhui
[J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 499 - 504
[3] Research on web character information extraction based on semantic similarity
Wang, Bao-Cheng
Huang, Wei
Li, Zhong-Ren
Xiao, Ke
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONIC INFORMATION ENGINEERING (CEIE 2016), 2016, 116 : 663 - 670
[4] Research on Information Extraction Based on Web Table Structure and Ontology
Wang, Xiaofeng
[J]. MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2254 - 2259
[5] A Research of the Internet Based on Web Information Extraction and Data Fusion
Jiang, Yajun
Wu, Zaoliang
Zhan, Zengrong
Xu, Lingyu
[J]. NEW HORIZONS IN WEB-BASED LEARNING: ICWL 2010 WORKSHOPS, 2011, 6537 : 195 - 206
[6] Research of Web information extraction MAS model based on KPS
Duan Longzhen
Qian Jun
Huang Shuiyuan
Yu Jing
Zhang Hejiang
[J]. ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, PROCEEDINGS, 2007, : 520 - 524
[7] Research on PageRank Algorithm parallel computing Based on Hadoop
Yang, Pengfei
Zhou, Liqing
[J]. Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering (MMME 2016), 2016, 79 : 182 - 185
[8] Research on the System of Public Opinion-Monitoring for Internet Based on Hadoop and Information Extraction Technology
Nie, Peiyao
Hu, Yaobin
Geng, Changxin
Lin, Peiguang
[J]. EMERGING RESEARCH IN ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, 2012, 315 : 234 - 242
[9] Research on method of learning web information extraction rule based on XPATH
Hu, Yan
Xuan, Yanyan
[J]. DCABES 2007 PROCEEDINGS, VOLS I AND II, 2007, : 897 - 899
[10] Research on the Application of Web Information Extraction Based On Semi Structured XML
Yang, Guo-Jun
[J]. 2016 INTERNATIONAL CONFERENCE ON SERVICE SCIENCE, TECHNOLOGY AND ENGINEERING (SSTE 2016), 2016, : 317 - 323

← 1 2 3 4 5 →