Extracting structures of HTML']HTML documents using a high-level stack machine

被引：0

作者：

Lim, SJ ^{[1
]}

Ng, YK ^{[1
]}

机构：

[1] Brigham Young Univ, Provo, UT 84602 USA

来源：

INFORMATION NETWORKING IN ASIA | 2001年 / 3卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Information on the Web, which are conglomeration of heterogeneous data such as texts, images and audio clips, are often accessed through documents written according to the HTML specification I). According to the HTML specification, HTML documents are semistructured in nature. We propose a high-level stack machine (HSM) which accesses an HTML document through its URL and constructs a semistructured data graph (SDG) of the document. The SDG of an HTML document H precisely captures the structure of the semistructured data embedded in H based on the dependency relationship (?)) among the data objects in H. HSM is configurable to accommodate a user's interest with respect to the HTML elements in H to be considered during the construction process of the SDG of H.

引用

页码：177 / 188

页数：12

共 50 条

[1] Extracting structures of HTML']HTML documents
Lim, SJ
Ng, YK
[J]. TWELFTH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN-12), PROCEEDINGS, 1998, : 420 - 426
[2] WebView: A tool for retrieving internal structures and extracting information from HTML']HTML documents
Lim, SJ
Ng, YK
[J]. 6TH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 1999, : 71 - 80
[3] Categorizing and extracting information from multilingual HTML']HTML documents
Lim, SJ
Ng, YK
[J]. 9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 415 - 422
[4] Using Semantic-Level Tags in HTML']HTML/XML Documents
Henschen, Lawrence J.
Lee, Julia C.
[J]. UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III, 2009, 5616 : 683 - 692
[5] Extracting logical structures from HTML']HTML tables
Kim, Yeon-Seok
Lee, Kyong-Ho
[J]. COMPUTER STANDARDS & INTERFACES, 2008, 30 (05) : 296 - 308
[6] Extracting Logical Hierarchical Structure of HTML']HTML Documents Based on Headings
Manabe, Tomohiro
Tajima, Keishi
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1606 - 1617
[7] Automatic discovery of semantic structures in HTML']HTML documents
Mukherjee, S
Yang, GZ
Tan, WF
Ramakrishnan, IV
[J]. SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 245 - 249
[8] A Method of Readability Assessment for Web Documents Using Text Features and HTML']HTML Structures
Yamasaki, Takahiro
Tokiwa, Kin-Ichiroh
[J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2014, 97 (10) : 1 - 10
[9] USING COOLLISTS TO INDEX HTML']HTML DOCUMENTS IN THE WEB
LIM, JG
[J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 28 (1-2): : 147 - 154
[10] Using the structure of HTML']HTML documents to improve retrieval
Cutler, M
Shih, YM
Meng, WY
[J]. PROCEEDINGS OF THE USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, 1997, : 241 - 251

← 1 2 3 4 5 →