CiteSeerX: AI in a Digital Library Search Engine

被引:0
|
作者
Wu, Jian [1 ]
Williams, Kyle [1 ]
Chen, Hung-Hsuan [1 ]
Khabsa, Madian [2 ]
Caragea, Cornelia [3 ]
Ororbia, Alexander [1 ]
Jordan, Douglas [2 ]
Giles, C. Lee [1 ,2 ]
机构
[1] Penn State Univ, Informat Sci & Technol, University Pk, PA 16802 USA
[2] Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA
[3] Univ North Texas, Comp Sci & Engn, Denton, TX 76203 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
CiteSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.
引用
收藏
页码:2930 / 2937
页数:8
相关论文
共 50 条
  • [1] CiteSeerX: AI in a Digital Library Search Engine
    Wu, Jian
    William, Kyle
    Chen, Hung-Hsuan
    Khabsa, Madian
    Caragea, Cornelia
    Tuarob, Suppawong
    Ororbia, Alexander
    Jordan, Douglas
    Mitra, Prasenjit
    Giles, C. Lee
    [J]. AI MAGAZINE, 2015, 36 (03) : 35 - 48
  • [2] Web Crawler Middleware for Search Engine Digital Libraries: A Case Study for CiteSeerX
    Wu, Jian
    Teregowda, Pradeep
    Khabsa, Madian
    Carman, Stephen
    Jordan, Douglas
    Wandelmer, Jose San Pedro
    Lu, Xin
    Mitra, Prasenjit
    Giles, C. Lee
    [J]. PROCEEDINGS OF THE TWELFTH INTERNATIONAL WORKSHOP ON WEB INFORMATION AND DATA MANAGEMENT, 2012, : 57 - 64
  • [3] A Figure Search Engine Architecture for a Chemistry Digital Library
    Choudhury, Sagnik Ray
    Tuarob, Suppawong
    Mitra, Prasenjit
    Rokach, Lior
    Kirk, Andi
    Szep, Silvia
    Pellegrino, Donald
    Jones, Sue
    Giles, C. Lee
    [J]. JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 369 - 370
  • [4] The Impact of User Corrections On A Crawl-Based Digital Library: A CiteSeerX Perspective
    Wu, Jian
    Williams, Kyle
    Khabsa, Madian
    Giles, C. Lee
    [J]. 2014 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2014, : 171 - 176
  • [5] The research about Intelligent Search Engine and in Digital Library personalization services
    Fu Junhui
    Lv Jingqiao
    Li Xueyong
    [J]. SMART MATERIALS AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2011, 143-144 : 333 - +
  • [6] Digital Library Engine: Adapting Digital Library for Cloud Computing
    Lu, Weiming
    Zheng, Liangju
    Shao, Jian
    Wei, Baogang
    Zhuang, Yueting
    [J]. 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2013), 2013, : 934 - 941
  • [7] An XQuery engine for digital library systems
    Kang, JH
    Kim, CS
    Ko, EJ
    [J]. 2003 JOINT CONFERENCE ON DIGITAL LIBRARIES, PROCEEDINGS, 2003, : 400 - 400
  • [8] Generative AI and Web Search Engine Developments
    Ojala, Marydee
    [J]. Computers in Libraries, 2023, 43 (02) : 43 - 44
  • [9] Google programmable search engine as library's integrated search service
    Sobolevskaya, Yulia, V
    [J]. NAUCHNYE I TEKHNICHESKIE BIBLIOTEKI-SCIENTIFIC AND TECHNICAL LIBRARIES, 2024, (08): : 62 - 77
  • [10] Google Search Engine and its Usefulness to Library Professionals
    Jain, Vivekanand
    Saraf, Sanjiv
    [J]. DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2006, 26 (05): : 23 - 28