SEMANTIC MINING OF DOCUMENTS IN A RELATIONAL DATABASE

被引:0
|
作者
Mukerjee, Kunal [1 ]
Porter, Todd [1 ]
Gherman, Sorin [1 ]
机构
[1] Microsoft, SQL Server RDBMS, Redmond, WA 98052 USA
关键词
Semantic mining; Documents; Full text search; SQL Server;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatically mining entities, relationships, and semantics from unstructured documents and storing these in relational tables, greatly simplifies and unifies the work flows and user experiences of database products at the Enterprise. This paper describes three linear scale, incremental, and fully automatic semantic mining algorithms that are at the foundation of the new Semantic Platform being released in the next version of SQL Server. The target workload is large (10 - 100 million) enterprise document corpuses. At these scales, anything short of linear scale and incremental is costly to deploy. These three algorithms give rise to three weighted physical indexes: Tag Index (top keywords in each document); Document Similarity Index (top closely related documents given any document); and Phrase Similarity Index (top semantically related phrases, given any phrase), which are then query-able through the SQL interface. The need for specifically creating these three indexes was motivated by observing typical stages of document research, and gap analysis, given current tools and technology at the Enterprise. We describe the mining algorithms and architecture, and outline some compelling user experiences that are enabled by these indexes.
引用
收藏
页码:146 / 158
页数:13
相关论文
共 50 条
  • [21] Querying XML documents from a relational database in the presence of DTDs
    Rege, M
    Caraconcea, I
    Lu, SY
    Fotouhi, F
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2004, 3347 : 168 - 177
  • [22] Management of XML documents without schema in relational database systems
    Kudrass, T
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2002, 44 (04) : 269 - 275
  • [23] Mining Multidimensional Frequent Patterns from Relational Database
    Lee, Yue-Shi
    Yen, Show-Jane
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2013), PT I,, 2013, 7802 : 51 - 60
  • [24] Data mining using relational database management systems
    Zou, B
    Ma, X
    Kemme, B
    Newton, G
    Precup, D
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 657 - 667
  • [25] Data mining of inclusion dependency from relational database
    Wang, SL
    Chen, YC
    Hong, TP
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 505 - 509
  • [26] AN IMPROVED ALGORITHM FOR MINING ASSOCIATION RULE IN RELATIONAL DATABASE
    Wang, Pei
    An, Chunhong
    Wang, Lei
    [J]. PROCEEDINGS OF 2014 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2014, : 247 - 252
  • [27] An approach to mining the multi-relational imbalanced database
    Lee, Chien-I
    Tsai, Cheng-Jung
    Wu, Tong-Qin
    Yang, Wei-Pang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (04) : 3021 - 3032
  • [28] Research on Relational Database Fusion Method for Data Mining
    Li, Xiangqin
    Luo, Chuanjun
    [J]. PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 388 - 392
  • [29] Semantic SPARQL query in a relational database based on ontology construction
    Hazber, Mohamed A. G.
    Li, Ruixuan
    Gu, Xiwu
    Xu, Guandong
    Li, Yuhua
    [J]. 2015 11TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2015, : 25 - 32
  • [30] AN APPROACH TO GENERATION OF SEMANTIC NETWORK FROM RELATIONAL DATABASE SCHEMA
    WU, XD
    ZHANG, DC
    [J]. CHINESE SCIENCE BULLETIN, 1991, 36 (14): : 1222 - 1225