A STRUCTURAL APPROACH TO EXTRACTING CHINESE POSITION RELATIONS FROM WEB PAGES

被引：0

作者：

Jin, Peiquan ^{[1
]}

Yang, Jia ^{[1
]}

Zhao, Jie ^{[2
]}

Liu, Yanhong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[2] Anhui Univ, Hefei, Peoples R China

来源：

JOURNAL OF WEB ENGINEERING | 2013年 / 12卷 / 05期

基金：

美国国家科学基金会;

关键词：

Position Relation; Relation Extraction; Structural File Segment;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The use of position relations, which refer to the position of people in an organization, can serve for enterprises as a significant competitive intelligence method. The rapid growth of the data volume in the Web brings new opportunities for us to extract position relations of interest from the Web. In this paper, we propose a new algorithm to extract position relations from the Web. Our algorithm is based on the structural feature of position relations in the Web, i.e., a position relation is usually presented in Web pages as a table or a list. In order to define the structural feature of Web content, we first introduce a structural coefficient for each Web page, which is then used to generate structural file segments for Web pages. A structural file segment consists of all candidates of position relations having a similar structure. After that, we employ a pattern-matching method to extract position relations from the structural file segments. Finally, we conduct experiments on a real data set containing 6028 Chinese Web pages gathered by the Baidu search engine, and evaluate precision and recall of our approach. The experimental results confirm that our algorithm has a precision over 96% and a recall over 87%.

引用

页码：363 / 382

页数：20

共 50 条

[41] A Rule Based DFA Driven Information Extractor for Content Extracting from Web Pages
Liu, Jin
Chu, Danliang
Song, Junjie
Zhong, Bei
Cai, Biqi
[J]. INTELLIGENT SYSTEMS AND APPLICATIONS (ICS 2014), 2015, 274 : 482 - 488
[42] Extracting Topics Information from Conference Web Pages using Page Segmentation and SVM
Chen, Yaw-Huei
Li, Sin-Sian
Chen, Yu-Ta
[J]. INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 270 - 277
[43] Software agents for extracting, aggregating and updating data from web pages of genomic databanks
Stella, A
Masseroli, M
Alcalay, M
Pinciroli, F
[J]. AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 1171 - 1171
[44] An open platform for collecting domain specific web pages and extracting information from them
Karkaletsis, V
Spyropoulos, CD
[J]. Knowledge Mining, 2005, 185 : 147 - 157
[45] Creative activity support by extracting notable keywords from Web pages with modified dates
Sunayama, A
Yachida, M
[J]. PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2002, : 838 - 843
[46] Extracting content structure for web pages based on visual representation
Cai, D
Yu, SP
Wen, JR
Ma, WY
[J]. WEB TECHNOLOGIES AND APPLICATIONS, 2003, 2642 : 406 - 417
[47] Extracting term collocations for directing users to informative web pages
Yamamoto, Eiko
Isahara, Hitoshi
[J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 310 - 321
[48] EXTRACTING THE SEMANTIC CONTENT OF WEB PAGES VIA REPEATED STRUCTURES
He, Zheng
Luo, Hangzai
Fan, Jianping
Liu, Xiao
[J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
[49] Structural analysis and grouping of Web pages
Kojima, Shuichi
Takasu, Atsuhiro
Adachi, Jun
[J]. NII Journal, 2002, (04): : 23 - 35
[50] Extracting Relations from Web Tables by Leveraging Table Entity Behaviours
de Alwis, Lahiru
Dissanayake, Achala
Pallewatte, Manujith
Silva, Kalana
Thayasivam, Uthayasanker
[J]. 2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 1 - 6

← 1 2 3 4 5 →