Mining the SDSS Sky Server SQL Queries Log

被引:0
|
作者
Hirota, Vitor Makiyama [1 ]
Santos, Rafael [1 ]
Raddick, Jordan [2 ]
Thakar, Ani [2 ]
机构
[1] Natl Inst Space Res, Ave Astronautas 1758, Sao Paulo, Brazil
[2] Johns Hopkins Univ, 3400 N Charles St, Baltimore, MD USA
来源
NEXT-GENERATION ANALYST IV | 2016年 / 9851卷
关键词
Text Mining; SQL; Web Logs;
D O I
10.1117/12.2224237
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Evolving SQL queries for data mining
    Salim, M
    Yao, X
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 62 - 67
  • [2] Application of SQL Server in Data Mining
    Zhang, Zhansheng
    Wang, Guicheng
    Yang, Lei
    Zhang, Min
    Zhao, Wendan
    Xu, Xinhe
    [J]. 2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 521 - +
  • [3] SDSS Log Viewer: Visual Exploratory Analysis of Large-Volume SQL Log Data
    Zhang, Jian
    Chen, Chaomei
    Vogeley, Michael S.
    Pan, Danny
    Thakar, Ani
    Raddic, Jordan
    [J]. VISUALIZATION AND DATA ANALYSIS 2012, 2012, 8294
  • [4] Optimizing star join queries for data warehousing in Microsoft SQL Server
    Galindo-Legaria, Cesar A.
    Grabs, Torsten
    Gukal, Sreenivas
    Herbert, Steve
    Surna, Aleksandras
    Wang, Shirley
    Yu, Wei
    Zabback, Peter
    Zhang, Shin
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1190 - 1199
  • [5] Advanced Studying on Microsoft SQL Server Data Mining
    Ren, Zhijun
    [J]. 2010 INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATION AND 2010 ASIA-PACIFIC CONFERENCE ON INFORMATION TECHNOLOGY AND OCEAN ENGINEERING: CICC-ITOE 2010, PROCEEDINGS, 2010, : 87 - 89
  • [6] Building Data Mining Applications with SQL Server 2005
    Wang, Dongyun
    Ren, Zhijun
    [J]. 2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 10859 - 10862
  • [7] Web log mining and parallel SQL based execution
    Kitsuregawa, M
    Shintani, T
    Yoshizawa, T
    Pramudiono, I
    [J]. DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2001, 1966 : 20 - 32
  • [8] Pattern Mining and Detection of Malicious SQL Queries on Anonymization Mechanism
    Zheng, Jianguo
    Shen, Xinyu
    [J]. IEEE ACCESS, 2021, 9 : 15015 - 15027
  • [9] Mining Stack Overflow for Discovering Error Patterns in SQL Queries
    Nagy, Csaba
    Cleve, Anthony
    [J]. 2015 31ST INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) PROCEEDINGS, 2015, : 516 - 520
  • [10] Developing a Data Mining Model for Predicting with SQL Server 2008
    Wang, Dongyun
    [J]. 2010 INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATION AND 2010 ASIA-PACIFIC CONFERENCE ON INFORMATION TECHNOLOGY AND OCEAN ENGINEERING: CICC-ITOE 2010, PROCEEDINGS, 2010, : 83 - 86