Extracting structures of HTML']HTML documents using a high-level stack machine

被引:0
|
作者
Lim, SJ [1 ]
Ng, YK [1 ]
机构
[1] Brigham Young Univ, Provo, UT 84602 USA
来源
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information on the Web, which are conglomeration of heterogeneous data such as texts, images and audio clips, are often accessed through documents written according to the HTML specification I). According to the HTML specification, HTML documents are semistructured in nature. We propose a high-level stack machine (HSM) which accesses an HTML document through its URL and constructs a semistructured data graph (SDG) of the document. The SDG of an HTML document H precisely captures the structure of the semistructured data embedded in H based on the dependency relationship (?)) among the data objects in H. HSM is configurable to accommodate a user's interest with respect to the HTML elements in H to be considered during the construction process of the SDG of H.
引用
收藏
页码:177 / 188
页数:12
相关论文
共 50 条
  • [1] Extracting structures of HTML']HTML documents
    Lim, SJ
    Ng, YK
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN-12), PROCEEDINGS, 1998, : 420 - 426
  • [2] WebView: A tool for retrieving internal structures and extracting information from HTML']HTML documents
    Lim, SJ
    Ng, YK
    [J]. 6TH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 1999, : 71 - 80
  • [3] Categorizing and extracting information from multilingual HTML']HTML documents
    Lim, SJ
    Ng, YK
    [J]. 9TH INTERNATIONAL DATABASE ENGINEERING & APPLICATION SYMPOSIUM, PROCEEDINGS, 2005, : 415 - 422
  • [4] Using Semantic-Level Tags in HTML']HTML/XML Documents
    Henschen, Lawrence J.
    Lee, Julia C.
    [J]. UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III, 2009, 5616 : 683 - 692
  • [5] Extracting logical structures from HTML']HTML tables
    Kim, Yeon-Seok
    Lee, Kyong-Ho
    [J]. COMPUTER STANDARDS & INTERFACES, 2008, 30 (05) : 296 - 308
  • [6] Extracting Logical Hierarchical Structure of HTML']HTML Documents Based on Headings
    Manabe, Tomohiro
    Tajima, Keishi
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (12): : 1606 - 1617
  • [7] Automatic discovery of semantic structures in HTML']HTML documents
    Mukherjee, S
    Yang, GZ
    Tan, WF
    Ramakrishnan, IV
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 245 - 249
  • [8] A Method of Readability Assessment for Web Documents Using Text Features and HTML']HTML Structures
    Yamasaki, Takahiro
    Tokiwa, Kin-Ichiroh
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2014, 97 (10) : 1 - 10
  • [9] USING COOLLISTS TO INDEX HTML']HTML DOCUMENTS IN THE WEB
    LIM, JG
    [J]. COMPUTER NETWORKS AND ISDN SYSTEMS, 1995, 28 (1-2): : 147 - 154
  • [10] Using the structure of HTML']HTML documents to improve retrieval
    Cutler, M
    Shih, YM
    Meng, WY
    [J]. PROCEEDINGS OF THE USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, 1997, : 241 - 251